|
1 | 1 | # Dataflow in CPS |
2 | 2 |
|
3 | | -One of the main goals of CPS is to move the bulk of the project system work to background threads, |
4 | | -while still maintaining data consistency. To accomplish this, CPS leverages the [TPL.Dataflow](https://learn.microsoft.com/dotnet/standard/parallel-programming/dataflow-task-parallel-library) |
5 | | -library to produce a versioned, immutable, producer-consumer pattern to flow changes through the |
6 | | -project system. Dataflow is not always easy, and if used wrong it can quickly lead to corrupt |
7 | | -project states or deadlocks. |
8 | | - |
9 | | -## Types of Dataflow in CPS |
10 | | - |
11 | | -Dataflow in CPS comes primarily in two types, an original source or a chained source. |
12 | | - |
13 | | -1. Original Source |
14 | | - * Depends on an original source of data that is not part of dataflow. |
15 | | - * Always has its own version. |
16 | | - * IE: a file on disk |
17 | | -2. Chained Source |
18 | | - * Chains into existing dataflow. |
19 | | - * Can be one or multiple dataflow blocks that feed into this one. |
20 | | - * Very __rarely__ has its own version. Typically if it does, it can |
21 | | - be pulled out into an original source. |
22 | | - * Carries all the versions of the dataflow it chains to. |
23 | | - * More about versioning later |
24 | | - |
25 | | -## Data Consistency Problem |
26 | | - |
27 | | -Dataflow is simple when you have a single line of dependencies, but in CPS it is much more complex. |
28 | | -It is common for a chained datasource to require input from multiple upstream sources. It is also |
29 | | -common for those upstream sources to also have multiple inputs. This pattern introduces a data |
30 | | -consistency problem. Take a look at the dataflow diagram below (arrows represent dataflow): |
31 | | - |
32 | | -```mermaid |
33 | | -flowchart LR |
34 | | - A --> C |
35 | | - C --> D |
36 | | - B --> C |
37 | | - B --> D |
38 | | -``` |
39 | | - |
40 | | -In the above layout, `A` and `B` are original sources. `C` listens to both `A` and `B`, but since |
41 | | -they are _original_ sources `C` can produce a new value when either change. `D` is where it gets |
42 | | -complex. `D` can only produce values when it has `B` and `C` of the same source version. `D` only |
43 | | -produces a value when the version of `C` it has was produced from the same version of `B` that |
44 | | -`D` currently has. To solve this consistency issue CPS versions all dataflow and then synchronizes |
45 | | -around these published versions. |
46 | | - |
47 | | -## Dataflow Versioning |
48 | | - |
49 | | -To solve the problem described above, all dataflow in CPS produces types of `IProjectVersionedValue<T>`. |
50 | | -This type combines `T Value` and `IImmutableDictionary<NamedIdentity, IComparable> DataSourceVersions`. |
51 | | - |
52 | | -Then, chained dataflow will cary the versions of its upstream data sources. When a chained source has |
53 | | -multiple upstream sources its published version becomes the merged value of the its upstream sources. |
54 | | -This functionality is facilitated via `ProjectDataSources.SyncLinkTo`. When using that method to link |
55 | | -to multiple upstream sources, a middle dataflow block is created that only publishes to your block when |
56 | | -all recieved values are in a consistent state. See [this example](../extensibility/dataflow_example.md#chained-data-source-multiple-sources) |
57 | | -for how to use `SyncLinkTo`. |
58 | | - |
59 | | -### Rules to Follow with Versioning |
60 | | - |
61 | | -__When you are a...__ |
62 | | -* __Original source__ you have your own `DataSourceKey` and `DataSourceVersion`. The key |
63 | | - identifies who you are, and the version must incremenet whenever you produce a new value. |
64 | | - The only value in your `DataSourceVersions` published is your own. |
65 | | -* __Chained source__ you must merge and carry the versions of all dataflow you are chained |
66 | | - to in your own `DataSourceVersions`. You very rarely have your own version because your |
67 | | - version is just the combined versions that you chained to. If you do need your own version, |
68 | | - consider pulling the part that publishes the original data into its own source. |
69 | | - |
70 | | -### Allowing Inconsistent Versions |
71 | | - |
72 | | -In special cases that require it, it is possible to allow for inconsistent versions in your dataflow. |
73 | | -This is for when you depend on multiple upstream sources where one is drastically slower at producing |
74 | | -values than others, but you want to be able to produce intermediate values while the slow one is still |
75 | | -processing. Unfortunately, there is no CPS base class equivalent to `ProjectValueDataSourceBase` or |
76 | | -`ChainedProjectValueDataSourceBase` for this scenario. You will have to manually link to your upstream |
77 | | -sources and synchronizing between multiple sources publishing at once. For calculating the data versions |
78 | | -to publish, use `ProjectDataSources.MergeDataSourceVersions`. |
79 | | - |
80 | | -## Further reading |
81 | | - |
82 | | -- [Dataflow Examples](../extensibility/dataflow_example.md) |
83 | | -- [Dataflow Sources](../extensibility/dataflow_sources.md) |
84 | | -- [Dataflow Best Practices](../extensibility/dataflow_best_practices.md) |
| 3 | +Moved to [Dataflow in CPS](dataflow#dataflow-in-cps). |
0 commit comments