Thread-safe time zone cache via ReentrantLock#345
Conversation
Codecov Report
@@ Coverage Diff @@
## master #345 +/- ##
==========================================
+ Coverage 93.60% 93.61% +0.01%
==========================================
Files 32 32
Lines 1532 1536 +4
==========================================
+ Hits 1434 1438 +4
Misses 98 98
Continue to review full report at Codecov.
|
|
@omus: Yeah, this seems like a valid approach as well. The problem is that with this approach, only one task can access the cache at a time, which introduces a lot of contention. In particular, we may have as many as 64 threads running that are constructing TimeZones in parallel, and we don't want to have to synchronize on every access to the cache. Using a Read-Write Lock here would probably work, if you wanted to go with a locking approach. The idea in #344 is that each thread would have to build the cache once, but since there are only a small number of threads (note: OS threads, not julia Tasks), this cache initialization cost should only have to be payed a small handful of times, probably usually 4-8 for most systems, and as many as 32 or 64 in very-wide systems. So there will be more warm-up cost than with a shared cache, but having separate caches gives the least contention. A read-write lock might be a happy medium, if we want to investigate that together. |
|
Lemme copy over the discussion from our internal (closed source) repo: We talked about using our internal SynchronizedCache (which is basically just a wrapper around a dict and a ReentrantLock, like you're doing here), and @comnik pointed out the issue i mentioned above. @rai-nhdaly (this is my work account):
@rai-nhdaly:
@rai-nhdaly:
@rai-nhdaly:
|
|
Unfortunately we still haven't gotten around to open-sourcing our multithreading utilities, which we really should do :( |
@NHDaly please correct me if I am wrong but the way I think I've set this up should only result in lock contention in the case of a cache miss. In the case of a cache hit no locking is required. |
|
Ah, i hadn't actually read through your code yet 😅 I don't think that will work, since concurrent reads and writes of the dictionary itself also need to be locked: if e.g. someone is in the middle of looking up a value in the dictionary when someone else inserts a new value into the dictionary, and that new insert causes the hash table to be resized, they may end up reading garbage and/or it could crash. All accesses to the dictionary need to be synchronized, including cache reads. If we used a read/write lock, then you could do something more like what you have here, where at least all of the reads could proceed in parallel until there is a pending write. Does that match your understanding? |
|
I did a bunch of experimentation here to cover the essentials. One promising avenue was using a readers-writer lock which allows for concurrent read-only access and exclusive write access. I implemented both versions as mentioned on the Wikipedia page and found the main bottleneck to be the usage of locks at all when working with concurrent readers (should be used the most). In an attempt to avoid that problem used an atomic only implementation which was faster (there does seem to be a race-condition still in my implementation) but wasn't quite as fast as #344. Here are some benchmark results: using Base.Threads, TimeZones, BenchmarkTools
function clear_cache()
if isdefined(TimeZones, :_tz_cache_init)
TimeZones._tz_cache_init()
else
empty!(TimeZones.TIME_ZONE_CACHE)
end
end
function f(v)
@threads for i in 1:length(v)
v[i] = TimeZone("America/Winnipeg")
end
return v
end
function g(v)
names = timezone_names()
@threads for i in 1:length(v)
name = rand(names)
tz = TimeZone(name, TimeZones.Class(:ALL))
@assert tz.name == name
v[i] = tz
end
return v
endReadersWriterLock implementation using only atomics: julia> clear_cache(); @btime f($(Vector{TimeZone}(undef, 100)));
14.000 μs (320 allocations: 12.52 KiB)
julia> clear_cache(); @btime f($(Vector{TimeZone}(undef, 1000)));
101.541 μs (3510 allocations: 118.62 KiB)
julia> clear_cache(); @btime f($(Vector{TimeZone}(undef, 10000)));
1.061 ms (39510 allocations: 1.21 MiB)
julia> clear_cache(); @btime g($(Vector{TimeZone}(undef, 100)));
2.256 ms (7266 allocations: 515.02 KiB)
julia> clear_cache(); @btime g($(Vector{TimeZone}(undef, 1000)));
2.335 ms (11355 allocations: 658.98 KiB)
julia> clear_cache(); @btime g($(Vector{TimeZone}(undef, 1000))); # race-conditionImplemention from #344 julia> clear_cache(); @btime f($(Vector{TimeZone}(undef, 100)));
7.625 μs (321 allocations: 12.55 KiB)
julia> clear_cache(); @btime f($(Vector{TimeZone}(undef, 1000)));
30.500 μs (3509 allocations: 118.59 KiB)
julia> clear_cache(); @btime f($(Vector{TimeZone}(undef, 10000)));
273.083 μs (39510 allocations: 1.21 MiB)
julia> clear_cache(); @btime g($(Vector{TimeZone}(undef, 100)));
2.245 ms (7266 allocations: 514.94 KiB)
julia> clear_cache(); @btime g($(Vector{TimeZone}(undef, 1000)));
2.340 ms (11355 allocations: 659.00 KiB)
julia> clear_cache(); @btime g($(Vector{TimeZone}(undef, 10000)));
3.092 ms (56355 allocations: 2.12 MiB) |
|
Thanks Curtis! Great analysis. In #344, we're trading off time for space, but i think for a small, bounded-size cache like this one, that's a reasonable tradeoff. 👍 |
Experimenting with an alternative approach to #344. Part of the reason for having the time zone cache is to avoid having to read from disk. The approach in #344 results in each thread having to load data from disk which I think is slower than this approach (performance tests to come).
Fixes #342