Skip to content

first#240

Draft
Whattabatt wants to merge 5 commits intomainfrom
davidg/config_flexibility
Draft

first#240
Whattabatt wants to merge 5 commits intomainfrom
davidg/config_flexibility

Conversation

@Whattabatt
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@soldni soldni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmmh this is ticky, would be nice to merge changes in soldni/backoff branch before touching linearizers

Comment thread python/dolma/warc/processor.py Outdated

# extract text
doc.text = linearizer.linearize(content=decoded_content)
if skip_linearization:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a no-op linearizer instead of a boolean flag?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Comment thread python/dolma/cli/tagger.py Outdated
default=False,
help="If true, only print the configuration and exit without running the taggers.",
)
document_dir: Optional[str] = field(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we parametrize attributes too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could for the sake of symmetry but it has little utility that I can see

@Whattabatt
Copy link
Copy Markdown
Contributor Author

mmmh this is ticky, would be nice to merge changes in soldni/backoff branch before touching linearizers

All good, I wasn't planning on pushing this yet, mostly wanted to validate the tests

@Whattabatt Whattabatt marked this pull request as draft February 14, 2025 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants