Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,8 @@
.git* export-ignore
/.travis.yml export-ignore
README.md export-ignore

# Remove the text attribute from reference files, so that git doesn't convert
# line separators on Windows machines. It causes the index files to become out
# of sync with the fasta files.
*.fa* -text
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
*.o
*.pico
*.obj
*.dSYM
*.exe
*.dll
*.pc.tmp
*-uninstalled.pc
/version.h
Expand Down
2 changes: 1 addition & 1 deletion cram/cram_encode.c
Original file line number Diff line number Diff line change
Expand Up @@ -1935,7 +1935,7 @@ static int cram_add_insertion(cram_container *c, cram_slice *s, cram_record *r,
}

/*
* Encodes auxiliary data.
* Encodes auxiliary data, CRAM 1.0 format.
* Returns the read-group parsed out of the BAM aux fields on success
* NULL on failure or no rg present (FIXME)
*/
Expand Down
6 changes: 3 additions & 3 deletions hfile.c
Original file line number Diff line number Diff line change
Expand Up @@ -573,6 +573,9 @@ static hFILE *hopen_fd(const char *filename, const char *mode)

hFILE *hdopen(int fd, const char *mode)
{
#if defined HAVE_SETMODE && defined O_BINARY
if (setmode(fd, O_BINARY) < 0) return NULL;
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect. See commentary in #283.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this patch, the test cases in test_bgzf.c which use the function try_bgzf_dopen fail on Windows, because the files are opened in text mode, and then the file descriptors are passed into bgzf_dopen and hdopen.

I considered changing the test cases instead, but couldn't really see why it would be a bad thing to always set binary mode on the file descriptors. Are there any cases where it would be a problem?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmarshall, can you please clarify the intention here for binary files. You state "The general policy in htslib is to open everything in binary mode", but that is precisely what this change does. Do you mean that all opens should be "rb", "wb" or similar and specifying it at that point? (Mostly we do that already of course.) That only covers fopen style APIs though and not the Unix open function used in the test case. The bgzf_dopen function should perhaps add "b" to the mode string before calling hdopen, but this doesn't really seem to fit the "everything is binary" philosophy if we're only doing it for bgzf and not for SAM.

Stdin/stdout as you note are the exceptions which clearly need a setmode call to change, but again we're turning stdin/stdout to an hfile via the dopen call once more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relevant part from #283 (quoting @jmarshall):

I would prefer not to call setmode() in hdopen() because:

  1. hdopen() can be called with file descriptors that are not files. In particular, on Windows sockets will not respond to setmode() particularly well;
  2. file descriptors given to hdopen() may already have had I/O done on them, so it may by then be too late to call setmode().

So we should fix test_bgzf.c, and note in the documentation for hdopen() that the file descriptor passed in must already be in binary mode.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[TL;DR: what Rob said.]

As noted in the #283 commentary that I already pointed you both to, there are at least two problematic cases: hdopen() can be too late to set binary mode, and it is invalid to call setmode() on some types of file descriptor given to hdopen().

If you are going to call hdopen() or bgzf_dopen() then it is your responsibility to have opened the file descriptor in binary mode.

The bug is in test_bgzf.c's try_bgzf_dopen() which should be setting O_BINARY as hfile_oflags() does, and probably (being part of htslib so kosher to include hfile_internal.h) it should just call hfile_oflags() instead of duplicating that code.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, "open" vs "dopen" in binary mode; agreed fixing the original open is correct.

We should probably document this somewhere too in the doxygen comments before the hdopen function as it's currently intuitive to believe mode should contain "b" and handle it there.

hFILE_fd *fp = (hFILE_fd*) hfile_init(sizeof (hFILE_fd), mode, blksize(fd));
if (fp == NULL) return NULL;

Expand All @@ -594,9 +597,6 @@ static hFILE *hopen_fd_fileuri(const char *url, const char *mode)
static hFILE *hopen_fd_stdinout(const char *mode)
{
int fd = (strchr(mode, 'r') != NULL)? STDIN_FILENO : STDOUT_FILENO;
#if defined HAVE_SETMODE && defined O_BINARY
if (setmode(fd, O_BINARY) < 0) return NULL;
#endif
return hdopen(fd, mode);
}

Expand Down
2 changes: 2 additions & 0 deletions test/compare_sam.pl
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,8 @@

# Compare lines
while ($ln1 && $ln2) {
$ln1 =~ s/\015?\012/\n/;
$ln2 =~ s/\015?\012/\n/;
chomp($ln1);
chomp($ln2);

Expand Down
6 changes: 3 additions & 3 deletions test/hfile.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ char *slurp(const char *filename)
{
char *text;
struct stat sbuf;
size_t filesize;
FILE *f = fopen(filename, "r");
if (f == NULL) fail("fopen(\"%s\", \"r\")", filename);
size_t filesize, readsize;
FILE *f = fopen(filename, "rb");
if (f == NULL) fail("fopen(\"%s\", \"rb\")", filename);
if (fstat(fileno(f), &sbuf) != 0) fail("fstat(\"%s\")", filename);
filesize = sbuf.st_size;

Expand Down
60 changes: 35 additions & 25 deletions test/test.pl
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ sub error
"Options:\n",
" -r, --redo-outputs Recreate expected output files.\n",
" -t, --temp-dir <path> When given, temporary files will not be removed.\n",
" -f, --fail-fast Fail-fast mode: exit as soon as a test fails.\n",
" -h, -?, --help This help message.\n",
"\n";
exit 1;
Expand All @@ -75,6 +76,7 @@ sub parse_params
my $ret = GetOptions (
't|temp-dir:s' => \$$opts{keep_files},
'r|redo-outputs' => \$$opts{redo_outputs},
'f|fail-fast' => \$$opts{fail_fast},
'h|?|help' => \$help
);
if ( !$ret or $help ) { error(); }
Expand Down Expand Up @@ -148,11 +150,13 @@ sub test_cmd
{
my @exp = <$fh>;
$exp = join('',@exp);
$exp =~ s/\015?\012/\n/g;
close($fh);
}
elsif ( !$$opts{redo_outputs} ) { failed($opts,$test,"$$opts{path}/$args{out}: $!"); return; }

if ( $exp ne $out )
(my $out_lf = $out) =~ s/\015?\012/\n/g;
if ( $exp ne $out_lf )
{
open(my $fh,'>',"$$opts{path}/$args{out}.new") or error("$$opts{path}/$args{out}.new");
print $fh $out;
Expand Down Expand Up @@ -180,6 +184,9 @@ sub failed
if ( defined $reason ) { print STDERR "\t$reason\n"; }
print STDERR ".. failed ...\n\n";
STDERR->flush();
if ($$opts{fail_fast}) {
die "\n";
}
}
sub passed
{
Expand All @@ -201,14 +208,17 @@ sub is_file_newer

my $test_view_failures;
sub testv {
my ($cmd) = @_;
my ($opts, $cmd) = @_;
print " $cmd\n";
my ($ret, $out) = _cmd($cmd);
if ($ret != 0) {
STDOUT->flush();
print STDERR "FAILED\n$out\n";
STDERR->flush();
$test_view_failures++;
if ($$opts{fail_fast}) {
die "\n";
}
}
}

Expand All @@ -233,50 +243,50 @@ sub test_view
$test_view_failures = 0;

# SAM -> BAM -> SAM
testv "./test_view $tv_args -S -b $sam > $bam";
testv "./test_view $tv_args $bam > $bam.sam_";
testv "./compare_sam.pl $sam $bam.sam_";
testv $opts, "./test_view $tv_args -S -b $sam > $bam";
testv $opts, "./test_view $tv_args $bam > $bam.sam_";
testv $opts, "./compare_sam.pl $sam $bam.sam_";

# SAM -> CRAM -> SAM
testv "./test_view $tv_args -t $ref -S -C $sam > $cram";
testv "./test_view $tv_args -D $cram > $cram.sam_";
testv "./compare_sam.pl $md $sam $cram.sam_";
testv $opts, "./test_view $tv_args -t $ref -S -C $sam > $cram";
testv $opts, "./test_view $tv_args -D $cram > $cram.sam_";
testv $opts, "./compare_sam.pl $md $sam $cram.sam_";

# BAM -> CRAM -> BAM -> SAM
$cram = "$bam.cram";
testv "./test_view $tv_args -t $ref -C $bam > $cram";
testv "./test_view $tv_args -b -D $cram > $cram.bam";
testv "./test_view $tv_args $cram.bam > $cram.bam.sam_";
testv "./compare_sam.pl $md $sam $cram.bam.sam_";
testv $opts, "./test_view $tv_args -t $ref -C $bam > $cram";
testv $opts, "./test_view $tv_args -b -D $cram > $cram.bam";
testv $opts, "./test_view $tv_args $cram.bam > $cram.bam.sam_";
testv $opts, "./compare_sam.pl $md $sam $cram.bam.sam_";

# SAM -> CRAM3 -> SAM
$cram = "$base.tmp.cram";
testv "./test_view $tv_args -t $ref -S -C -o VERSION=3.0 $sam > $cram";
testv "./test_view $tv_args -D $cram > $cram.sam_";
testv "./compare_sam.pl $md $sam $cram.sam_";
testv $opts, "./test_view $tv_args -t $ref -S -C -o VERSION=3.0 $sam > $cram";
testv $opts, "./test_view $tv_args -D $cram > $cram.sam_";
testv $opts, "./compare_sam.pl $md $sam $cram.sam_";

# BAM -> CRAM3 -> BAM -> SAM
$cram = "$bam.cram";
testv "./test_view $tv_args -t $ref -C -o VERSION=3.0 $bam > $cram";
testv "./test_view $tv_args -b -D $cram > $cram.bam";
testv "./test_view $tv_args $cram.bam > $cram.bam.sam_";
testv "./compare_sam.pl $md $sam $cram.bam.sam_";
testv $opts, "./test_view $tv_args -t $ref -C -o VERSION=3.0 $bam > $cram";
testv $opts, "./test_view $tv_args -b -D $cram > $cram.bam";
testv $opts, "./test_view $tv_args $cram.bam > $cram.bam.sam_";
testv $opts, "./compare_sam.pl $md $sam $cram.bam.sam_";

# CRAM3 -> CRAM2
$cram = "$base.tmp.cram";
testv "./test_view $tv_args -t $ref -C -o VERSION=2.1 $cram > $cram.cram";
testv $opts, "./test_view $tv_args -t $ref -C -o VERSION=2.1 $cram > $cram.cram";

# CRAM2 -> CRAM3
testv "./test_view $tv_args -t $ref -C -o VERSION=3.0 $cram.cram > $cram";
testv "./test_view $tv_args $cram > $cram.sam_";
testv "./compare_sam.pl $md $sam $cram.sam_";
testv $opts, "./test_view $tv_args -t $ref -C -o VERSION=3.0 $cram.cram > $cram";
testv $opts, "./test_view $tv_args $cram > $cram.sam_";
testv $opts, "./compare_sam.pl $md $sam $cram.sam_";

# Java pre-made CRAM -> SAM
my $jcram = "${base}_java.cram";
if (-e $jcram) {
my $jsam = "${base}_java.tmp.sam_";
testv "./test_view $tv_args -i reference=$ref $jcram > $jsam";
testv "./compare_sam.pl -Baux $md $sam $jsam";
testv $opts, "./test_view $tv_args -i reference=$ref $jcram > $jsam";
testv $opts, "./compare_sam.pl -Baux $md $sam $jsam";
}

if ($test_view_failures == 0)
Expand Down
2 changes: 2 additions & 0 deletions test/test_bgzf.c
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
*/

#include <config.h>

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
Expand Down
24 changes: 20 additions & 4 deletions test/test_view.c
Original file line number Diff line number Diff line change
Expand Up @@ -54,23 +54,39 @@ int main(int argc, char *argv[])

while ((c = getopt(argc, argv, "IbDCSl:t:i:o:N:BZ:@:")) >= 0) {
switch (c) {
case 'S': flag |= 1; break;
case 'I': ignore_sam_err = 1; break;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding usage is most welcomed, thanks, but as for prior comment, why have the options changed order? It doesn't look to be for consistency with the usage statement.

Copy link
Copy Markdown
Contributor Author

@anderskaplan anderskaplan Mar 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the order in the switch to be consistent with the string passed to getopt, right above the switch statement. For the usage instructions I aimed at what would make most sense for the user. Ideally they should all be consistent, but I didn't touch the getopt string because I'm not familiar with how it's used. I can revert the change of order, or change all of them to match the usage, whichever you prefer.

case 'b': flag |= 2; break;
case 'D': flag |= 4; break;
case 'C': flag |= 8; break;
case 'B': benchmark = 1; break;
case 'S': flag |= 1; break;
case 'l': clevel = atoi(optarg); flag |= 2; break;
case 't': fn_ref = optarg; break;
case 'I': ignore_sam_err = 1; break;
case 'i': if (hts_opt_add(&in_opts, optarg)) return 1; break;
case 'o': if (hts_opt_add(&out_opts, optarg)) return 1; break;
case 'N': nreads = atoi(optarg); break;
case 'B': benchmark = 1; break;
case 'Z': extra_hdr_nuls = atoi(optarg); break;
case '@': nthreads = atoi(optarg); break;
}
}
if (argc == optind) {
fprintf(stderr, "Usage: samview [-bSCSIB] [-N num_reads] [-l level] [-o option=value] [-Z hdr_nuls] <in.bam>|<in.sam>|<in.cram> [region]\n");
fprintf(stderr, "Usage: test_view [-IbDCS] [-l level] [-t fn_ref] [-i option=value] [-o option=value] [-N num_reads] [-B] [-Z hdr_nuls] [-@ num_threads] <in.bam>|<in.sam>|<in.cram> [region]\n");
fprintf(stderr, "\n");
fprintf(stderr, "-D: read CRAM format (mode 'c')\n");
fprintf(stderr, "-S: read compressed BCF, BAM, FAI (mode 'b')\n");
fprintf(stderr, "-I: ignore SAM parsing errors\n");
fprintf(stderr, "-t: fn_ref: load CRAM references from the specificed fasta file instead of @SQ headers when writing a CRAM file\n");
fprintf(stderr, "-i: option=value: set an option for CRAM input\n");
fprintf(stderr, "\n");
fprintf(stderr, "-b: write compressed BCF, BAM, FAI (mode 'b')\n");
fprintf(stderr, "-C: write CRAM format (mode 'c')\n");
fprintf(stderr, "-l 0-9: set zlib compression level\n");
fprintf(stderr, "-o option=value: set an option for CRAM output\n");
fprintf(stderr, "-N: num_reads: limit the output to the first num_reads reads\n");
fprintf(stderr, "\n");
fprintf(stderr, "-B: enable benchmarking\n");
fprintf(stderr, "-Z hdr_nuls: append specified number of null bytes to the SAM header\n");
fprintf(stderr, "-@ num_threads: use thread pool with specified number of threads\n");
return 1;
}
strcpy(moder, "r");
Expand Down