Skip to content
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 52 additions & 11 deletions reporters/ReportFpgOutputForObservationalModel.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ namespace Kernel
m_NumpyFilenameRoots = FileSystem::Concat( std::string(EnvPtr->OutputPath), std::string("roots.npy" ) );

m_GenomeDimensions.push_back( 0 );
m_GenomeDimensions.push_back( ParasiteGenetics::GetInstance()->GetNumBasePairs() );
m_GenomeDimensions.push_back( ParasiteGenetics::GetInstance()->GetIndexesBarcode().size()); // we only want SNP (barcode) data

write_numpy_file( write_func, dtype::int32, m_GenomeDimensions, m_NumpyFilenameAlleles, false );
write_numpy_file( write_func, dtype::int32, m_GenomeDimensions, m_NumpyFilenameRoots, false );
Expand Down Expand Up @@ -299,35 +299,76 @@ namespace Kernel

if( (EnvPtr->MPI.Rank == 0) && (m_GenomeList.size() > 0) )
{
// We only want the neutral alleles in these reports (the BarcodeLocations alleles)
const std::vector<int32_t>& r_barcode_indexes = ParasiteGenetics::GetInstance()->GetIndexesBarcode();
const size_t snp_data_size = r_barcode_indexes.size() * sizeof( int32_t );
// --------------------------------------------------------------------------------
// --- Define function that will write the nucleotide sequences to the numpy array
// --- file such that they can be read as a 2D array.
// --------------------------------------------------------------------------------
write_data_func write_func_alleles = [ this ]( size_t num_bytes, std::ofstream& file )
write_data_func write_func_alleles = [this, &r_barcode_indexes, &snp_data_size]( size_t num_bytes, std::ofstream& file )
Comment thread
stitova-idm marked this conversation as resolved.
Outdated

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allele and root write lambdas duplicate the same filtering logic (size check, allocate snps, loop over genomes, loop over indices, write bytes, update bytes_written). Consider extracting a shared helper that takes a function/ref returning the underlying std::vector<int32_t>& for a genome (nucleotide sequence vs allele roots). This reduces duplication and lowers the chance of future fixes being applied to only one path.

Copilot uses AI. Check for mistakes.
{
size_t bytes_written = 0;
if( m_GenomeList.size() > 0 )
{
size_t genome_size = m_GenomeList[0].GetNucleotideSequence().size() * sizeof(int32_t);
for( const ParasiteGenome& r_genome : m_GenomeList )
if(r_barcode_indexes.size() == m_GenomeList[0].GetNucleotideSequence().size())
{
file.write( (char *)r_genome.GetNucleotideSequence().data(), genome_size );
bytes_written += genome_size;
// in case we are storing only barcode SNPs as the full nucleotide sequence
for( const ParasiteGenome& r_genome : m_GenomeList )
{
file.write( (char*)r_genome.GetNucleotideSequence().data(), snp_data_size );

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
bytes_written += snp_data_size ;
}
}
else
Comment on lines +314 to +323

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using r_barcode_indexes.size() == GetNucleotideSequence().size() as a proxy for 'sequence is already barcode-only' can silently produce incorrect output if a non-barcode sequence happens to have the same length as the barcode list (or the storage format changes). A safer approach is to always select by r_barcode_indexes, or gate the fast-path behind an explicit/authoritative indicator (e.g., a config flag or a genome/parasite-genetics API that declares the stored representation is barcode-only).

Copilot uses AI. Check for mistakes.
{
std::vector<int32_t> snps;
snps.reserve( r_barcode_indexes.size() );
for(const ParasiteGenome& r_genome : m_GenomeList)
{
snps.clear();
for(auto index : r_barcode_indexes)
{
release_assert( index < r_genome.GetNucleotideSequence().size() );
snps.push_back( r_genome.GetNucleotideSequence()[index] );
}
file.write( (char*)snps.data(), snp_data_size );

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
bytes_written += snp_data_size;
}
}
}
release_assert( num_bytes == bytes_written );
};

write_data_func write_func_roots = [ this ]( size_t num_bytes, std::ofstream& file )
write_data_func write_func_roots = [ this , &r_barcode_indexes, &snp_data_size ]( size_t num_bytes, std::ofstream& file )

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both lambdas capture snp_data_size by reference even though it’s a local variable. Capturing it by value avoids potential lifetime issues (if the callable is stored/used later) and makes the capture intent clearer. Same applies to capturing r_barcode_indexes by reference if you don’t require it.

Copilot uses AI. Check for mistakes.

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The allele and root write lambdas duplicate the same filtering logic (size check, allocate snps, loop over genomes, loop over indices, write bytes, update bytes_written). Consider extracting a shared helper that takes a function/ref returning the underlying std::vector<int32_t>& for a genome (nucleotide sequence vs allele roots). This reduces duplication and lowers the chance of future fixes being applied to only one path.

Copilot uses AI. Check for mistakes.
{
size_t bytes_written = 0;
if( m_GenomeList.size() > 0 )
{
size_t genome_size = m_GenomeList[0].GetAlleleRoots().size() * sizeof(int32_t);
for( const ParasiteGenome& r_genome : m_GenomeList )
if(r_barcode_indexes.size() == m_GenomeList[0].GetAlleleRoots().size())
{
// in case we are storing only barcode SNPs as the full nucleotide sequence
for(const ParasiteGenome& r_genome : m_GenomeList)
{
file.write( (char*)r_genome.GetAlleleRoots().data(), snp_data_size );

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
bytes_written += snp_data_size;
}
}
else
{
file.write( (char *)r_genome.GetAlleleRoots().data(), genome_size );
bytes_written += genome_size;
std::vector<int32_t> snps;
snps.reserve( r_barcode_indexes.size() );
for(const ParasiteGenome& r_genome : m_GenomeList)
{
snps.clear();
for(auto index : r_barcode_indexes)
{
release_assert( index < r_genome.GetAlleleRoots().size() );
snps.push_back( r_genome.GetAlleleRoots()[index] );
}
file.write( (char*)snps.data(), snp_data_size );

Copilot AI Feb 6, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These file.write calls use C-style casts and pass a size_t length. Prefer reinterpret_cast<const char*>(...) (or std::as_bytes-style patterns where applicable) and cast the byte count to std::streamsize to match the iostream API, improving type-safety and avoiding signed/unsigned narrowing pitfalls.

Copilot uses AI. Check for mistakes.
bytes_written += snp_data_size;
}
}
}
release_assert( num_bytes == bytes_written );
Expand Down