Add 12 Indic language hyphenation patterns#30
Conversation
laurmaedje
left a comment
There was a problem hiding this comment.
Thanks for the PR! Just one small remark.
| By default, this crate supports hyphenating more than 30 languages. Embedding | ||
| automata for all these languages will add ~1.1 MiB to your binary. | ||
| By default, this crate supports hyphenating 48 languages. Embedding | ||
| automata for all these languages will add ~1.3 MiB to your binary. |
There was a problem hiding this comment.
On main, I get 1162474B if I sum all the file sizes (via find . -type f -exec stat -f '%z' {} + | awk '{sum += $1} END {print sum}'). On your branch, I get 1166722B, which is barely more. Divided by
Did you make a different calculation?
There was a problem hiding this comment.
With this new languages this is what I see:
du -hsc *
60K af.bin
4.0K as.bin
4.0K be.bin
16K bg.bin
4.0K bn.bin
4.0K ca.bin
40K cs.bin
8.0K da.bin
204K de.bin
4.0K el.bin
28K en.bin
16K es.bin
20K et.bin
4.0K fi.bin
8.0K fr.bin
8.0K gl.bin
4.0K gu.bin
4.0K hi.bin
4.0K hr.bin
348K hu.bin
24K is.bin
4.0K it.bin
12K ka.bin
4.0K kn.bin
4.0K ku.bin
4.0K la.bin
8.0K lt.bin
4.0K ml.bin
8.0K mn.bin
4.0K mr.bin
64K nl.bin
156K no.bin
4.0K or.bin
4.0K pa.bin
16K pl.bin
4.0K pt.bin
36K ru.bin
4.0K sa.bin
16K sk.bin
8.0K sl.bin
4.0K sq.bin
16K sr.bin
24K sv.bin
4.0K ta.bin
4.0K te.bin
4.0K tk.bin
4.0K tr.bin
24K uk.bin
1.3M total
But your method is more accurate since du counts "blocks" in disk and not actual file size(A file 200B will use 4K in dsik as that is one block). I will keep the number ~1.1.
Just to clarify: On the Typst side, you'd be on board with keeping this internal as an automatic language-based property, right? And since which character is used is not defined by hypher, I don't think adjustments here would be necessary. |
- Assamese (as) - Bengali (bn) - Gujarati (gu) - Hindi (hi) - Kannada (kn) - Malayalam (ml) - Marathi (mr) - Oriya (or) - Panjabi (pa) - Sanskrit (sa) - Tamil (ta) - Telugu (te)
bc9fd77 to
01dd84b
Compare
I am less familiar with these systems. So please correct me if I am wrong. I was in the assumption that Hypher supplies language based properties to typst. For example, the |
Ah, that's a fair view on things. I would be fine with that! |
|
Thank you! |
Following languages are added
As dicussed in typst/typst#8033 this PR adds 12 indic languages to Hypher. As a follow up I will attempt to define hyphenation character property(in another PR)
Except Sanskrit, all other hyphenation patterns are authored by myself. And license is permissive (MIT)