-
Notifications
You must be signed in to change notification settings - Fork 7
extract_rxc_precinct function #155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,118 @@ | ||
| #' Extract Precinct-Level Estimates from ei.MD.bayes Object | ||
| #' | ||
| #' Extracts precinct-specific ecological inference estimates from ei_rxc() output. | ||
| #' Uses exact string matching to handle variation column names | ||
| #' | ||
| #' @param eivote `ei_rxc()` output object containing `stat_objects` | ||
| #' @param cand_cols Character vector of candidate column names (e.g., `c("pct_cand_A", "pct_cand_B")`) | ||
| #' @param race_cols Character vector of race column names (e.g., `c("pct_black", "pct_white")`) | ||
| #' @param dat Original data frame used in `ei_rxc()` call | ||
| #' @param precinct_id Column name for precinct identifier (must exist in `dat`) | ||
| #' | ||
| #' @return Data frame with precinct IDs and race×candidate estimate columns | ||
| #' | ||
| #' @details | ||
| #' The function extracts `md_out$draws$Beta` from the `ei_rxc()` output, which contains | ||
| #' MCMC draws for each precinct-race-candidate combination. Beta column names follow | ||
| #' the format `"beta.race_name.cand_name.precinct_idx"`. The function computes posterior | ||
| #' means across MCMC iterations for each precinct. | ||
| #' | ||
| #' Output columns follow `expand.grid(cand, race)` ordering, with column names formatted | ||
| #' as `paste0(race, cand)` (e.g., `"pct_blackpct_cand_A"`). | ||
| #' | ||
| #' @examples | ||
| #' \donttest{ | ||
| #' | ||
| #' # library(eiCompare) | ||
| #' # data(gwinnett_ei) | ||
| #' # | ||
| #' # gwinnett_ei$precinct <- 1:nrow(gwinnett_ei) | ||
| #' # | ||
| #' # eivote <- ei_rxc( #this will take some time | ||
| #' # data = gwinnett_ei, | ||
| #' # cand_cols = c("kemp", "abrams", "metz"), | ||
| #' # race_cols = c("white", "black", "other"), | ||
| #' # totals_col = "turnout", | ||
| #' # seed = 12345 | ||
| #' #) | ||
| #' | ||
| #' # # Extract precinct-level estimates | ||
| #' # precinct_results <- extract_rxc_precinct( | ||
| #' # eivote = eivote, | ||
| #' # cand_cols = c("kemp", "abrams"), | ||
| #' # race_cols = c("white", "black", "other"), | ||
| #' # dat = gwinnett_ei, | ||
| #' # precinct_id = "precinct" | ||
| #' #) | ||
| #' | ||
| #' #head(precinct_results) | ||
| #' } | ||
| #' | ||
| #' @export | ||
| extract_rxc_precinct <- function(eivote, cand_cols, race_cols, dat, precinct_id) { | ||
|
|
||
| # Extract md_out object from ei_rxc wrapper | ||
| eiMD_object <- eivote$stat_objects[[1]] | ||
|
|
||
| # Extract Beta matrix (MCMC iterations × beta parameters) | ||
| Beta <- eiMD_object$draws$Beta | ||
|
|
||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was messing with using dplyr functionality, which I tend toward when possible over base R matrix stuff. I used this which pretty much does it. format is slightly different but you can tweak as needed. It simplifies the code and also makes it so the user doesn't need to specify the race and cands and the other args, it would just be the rxc result object. This assumes that the naming convention will always be If you want the user to be able to specify which cands and races are in the output you can keep those args and tweak this statement to filter on them. I kind of like it clean and just have it include all and the user can just subset results if they want. But thats really just an operational preference and I defer to you. If there are runs with a lot of cands and races it would mean an output with a ton of columns so I can see wanting to keep the args.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hey good suggestions throughout. I think we should defer on the dplyr functionality, just because we did some extensive stress testing/testing for edge case names and want the function to work with messier names. We could stress test your version, but I think we should defer that and get this up and running.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok thats ok for now. The next thing I want to do is make a bunch of package tests. Do you have some scripts of the stress testing you did? I can incorporate that in, which is important either way! Then if down the road we want to update this function we will have tests in place to make sure nothing breaks. |
||
| # Check that precinct_id column exists in dat | ||
| if(!precinct_id %in% colnames(dat)) { | ||
| stop(paste0("Column '", precinct_id, "' not found in dat. ", | ||
| "Available columns: ", paste(colnames(dat), collapse = ", "))) | ||
| } | ||
|
|
||
| n_precincts <- nrow(dat) | ||
| beta_colnames <- colnames(Beta) | ||
|
|
||
| # Initialize result matrix (precincts × race-candidate combinations) | ||
| result_matrix <- matrix(NA, | ||
| nrow = n_precincts, | ||
| ncol = length(race_cols) * length(cand_cols)) | ||
|
|
||
| # Loop through race-candidate combinations and extract precinct estimates | ||
| col_idx <- 1 | ||
| for(race in race_cols) { | ||
| for(cand in cand_cols) { | ||
|
|
||
| # Build expected prefix pattern for exact matching | ||
| # Format: beta.race.cand.precinct_number | ||
| expected_prefix <- paste0("beta.", race, ".", cand, ".") | ||
|
|
||
| # Find Beta columns matching this race-candidate pair | ||
| matching_cols <- grep(paste0("^", gsub("\\.", "\\\\.", expected_prefix)), | ||
| beta_colnames, | ||
| value = FALSE) | ||
|
|
||
| # Validation - should have exactly n_precincts matches | ||
| if(length(matching_cols) != n_precincts) { | ||
| stop(paste0("Column matching error for race='", race, "', cand='", cand, | ||
| "': found ", length(matching_cols), " columns but expected ", | ||
| n_precincts, " precincts")) | ||
| } | ||
|
|
||
| # Extract precinct indices and reorder to match dat row order | ||
| precinct_nums <- sub(expected_prefix, "", beta_colnames[matching_cols]) | ||
| precinct_order <- order(as.numeric(precinct_nums)) | ||
| matching_cols_ordered <- matching_cols[precinct_order] | ||
|
|
||
| # Calculate mean across MCMC iterations for each precinct | ||
| result_matrix[, col_idx] <- colMeans(Beta[, matching_cols_ordered]) | ||
| col_idx <- col_idx + 1 | ||
| } | ||
| } | ||
|
|
||
| # Create column names (race + candidate, matching expand.grid order) | ||
| col_names_df <- expand.grid(cand = cand_cols, race = race_cols) | ||
| col_names <- paste0(col_names_df$race, col_names_df$cand) | ||
|
|
||
| # Convert to data frame with column names | ||
| result_df <- as.data.frame(result_matrix) | ||
| colnames(result_df) <- col_names | ||
|
|
||
| # Attach precinct IDs from original data as first column | ||
| result_df <- cbind(dat[, precinct_id, drop = FALSE], result_df) | ||
|
|
||
| return(result_df) | ||
| } | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove add_, the function is just called
rpv_normalize