Skip to content

Commit c872f28

Browse files
committed
Consolidate Dataflow documentation
1 parent 65efb02 commit c872f28

3 files changed

Lines changed: 14 additions & 86 deletions

File tree

doc/Index.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@ VS Project System Documentation
2323
- [Responsive design](overview/responsive_design.md)
2424
- [Globbing behavior](overview/globbing_behavior.md)
2525
- [Dataflow](overview/dataflow.md)
26-
- [Dataflow in CPS](overview/dataflow_in_CPS.md)
2726
- [Dataflow source blocks](extensibility/dataflow_sources.md)
2827
- Diagnostics
2928
- [How to examine Visual Studio registry](overview/examine_registry.md)

doc/overview/dataflow.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -229,6 +229,8 @@ CPS has a few subclasses of `DataflowLinkOptions` that you can use in certain ci
229229

230230
# Dataflow in CPS
231231

232+
One of the main goals of CPS is to move the bulk of the project system work to background threads while still maintaining data consistency. To accomplish this, CPS leverages Dataflow to produce a versioned, immutable, producer-consumer pattern to flow changes through the project system. Dataflow is not always easy, and if used wrongly can lead to corrupt state and deadlocks.
233+
232234
## Slim blocks
233235

234236
TPL's Dataflow blocks are general purpose and have feautres that aren't used in CPS. Those unused features come with a performance/memory cost. To improve the scalability of CPS in large solutions, we have a replacement set of "slim" blocks that provide the required behaviours of TPL's blocks, but without the overhead associated with the unused features.
@@ -245,7 +247,9 @@ TPL's Dataflow blocks are general purpose and have feautres that aren't used in
245247

246248
Dataflow graphs publish immutable snapshots of data between blocks, where updates are pushed through the graph in an asynchronous fashion. This gives the framework a lot of flexibility to schedule the work, but can make it difficult to know when a given input has made its way through the graph to the outputs.
247249

248-
Another challenge with Dataflow graphs is joining data. Consider the following graph:
250+
Dataflow is simple when you have a single line of dependencies, but in CPS it is much more complex. It is common for a chained datasource to require input from multiple upstream sources. It is also common for those upstream sources to also have multiple inputs. This pattern introduces a data consistency problem.
251+
252+
Consider the following Dataflow graph:
249253

250254
```mermaid
251255
flowchart LR
@@ -271,7 +275,7 @@ public interface IProjectValueVersions
271275
}
272276
```
273277

274-
And in fact, a versioned value can have _more than one version!_ This makes sense when you consider that a given node in the graph can have more than one source block feeding in to it. Each of those source blocks provides its own versioned value, and as messages are joined, the sets of versions are merged.
278+
And in fact, a versioned value can have _more than one version!_ This makes sense when you consider that a given node in the graph can have more than one source block feeding in to it. Each of those source blocks provides its own versioned value, and as messages are joined the sets of versions are merged.
275279

276280
```mermaid
277281
flowchart LR
@@ -347,6 +351,12 @@ IDisposable link = ProjectDataSources.SyncLinkTo(
347351

348352
The `SyncLinkOptions` extension method allows the data source to be configured. If the source contains rule-based data (discussed [below](#rule-sources))
349353

354+
### Allowing inconsistent versions
355+
356+
In special cases that require it, it is possible to allow for inconsistent versions in your Dataflow. This is for when you depend on multiple upstream sources where one is drastically slower at producing values than others, but you want to be able to produce intermediate values while the slow one is still processing. An example of this is where you want data quickly from project evaluation, and also want the richer data that arrives later via design-time builds.
357+
358+
Unfortunately, there is no built-in support for this scenario. You will have to manually link to your upstream sources and synchronize between them. When producing chained output, to calculate the data versions to publish you may be able to use `ProjectDataSources.MergeDataSourceVersions`.
359+
350360
## Subscribing to project data
351361

352362
One of the main use cases for Dataflow in CPS is the processing of project data. Unlike the legacy CSPROJ project system where updates were generally applied on a single thread (the main thread), CPS uses Dataflow to schedule updates asyncrhonously on the thread pool.
@@ -476,7 +486,7 @@ CPS provides access to several such `IProjectValueDataSource<T>` instances via `
476486

477487
### Chained (derived) data sources
478488

479-
Most `IProjectValueDataSource<T>` instances will produce data that was derived from other project value data sources. CPS provides the abstract base class `ChainedProjectValueDataSourceBase<T>`, which makes creating such a derived (chained) source easy.
489+
Most `IProjectValueDataSource<T>` instances will produce data that was derived from one or more other project value data sources. CPS provides the abstract base class `ChainedProjectValueDataSourceBase<T>`, which makes creating such a derived (chained) source easy.
480490

481491
Let's look at an example of overriding this class to create a new data source that derives its data from one other source:
482492

doc/overview/dataflow_in_CPS.md

Lines changed: 1 addition & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -1,84 +1,3 @@
11
# Dataflow in CPS
22

3-
One of the main goals of CPS is to move the bulk of the project system work to background threads,
4-
while still maintaining data consistency. To accomplish this, CPS leverages the [TPL.Dataflow](https://learn.microsoft.com/dotnet/standard/parallel-programming/dataflow-task-parallel-library)
5-
library to produce a versioned, immutable, producer-consumer pattern to flow changes through the
6-
project system. Dataflow is not always easy, and if used wrong it can quickly lead to corrupt
7-
project states or deadlocks.
8-
9-
## Types of Dataflow in CPS
10-
11-
Dataflow in CPS comes primarily in two types, an original source or a chained source.
12-
13-
1. Original Source
14-
* Depends on an original source of data that is not part of dataflow.
15-
* Always has its own version.
16-
* IE: a file on disk
17-
2. Chained Source
18-
* Chains into existing dataflow.
19-
* Can be one or multiple dataflow blocks that feed into this one.
20-
* Very __rarely__ has its own version. Typically if it does, it can
21-
be pulled out into an original source.
22-
* Carries all the versions of the dataflow it chains to.
23-
* More about versioning later
24-
25-
## Data Consistency Problem
26-
27-
Dataflow is simple when you have a single line of dependencies, but in CPS it is much more complex.
28-
It is common for a chained datasource to require input from multiple upstream sources. It is also
29-
common for those upstream sources to also have multiple inputs. This pattern introduces a data
30-
consistency problem. Take a look at the dataflow diagram below (arrows represent dataflow):
31-
32-
```mermaid
33-
flowchart LR
34-
A --> C
35-
C --> D
36-
B --> C
37-
B --> D
38-
```
39-
40-
In the above layout, `A` and `B` are original sources. `C` listens to both `A` and `B`, but since
41-
they are _original_ sources `C` can produce a new value when either change. `D` is where it gets
42-
complex. `D` can only produce values when it has `B` and `C` of the same source version. `D` only
43-
produces a value when the version of `C` it has was produced from the same version of `B` that
44-
`D` currently has. To solve this consistency issue CPS versions all dataflow and then synchronizes
45-
around these published versions.
46-
47-
## Dataflow Versioning
48-
49-
To solve the problem described above, all dataflow in CPS produces types of `IProjectVersionedValue<T>`.
50-
This type combines `T Value` and `IImmutableDictionary<NamedIdentity, IComparable> DataSourceVersions`.
51-
52-
Then, chained dataflow will cary the versions of its upstream data sources. When a chained source has
53-
multiple upstream sources its published version becomes the merged value of the its upstream sources.
54-
This functionality is facilitated via `ProjectDataSources.SyncLinkTo`. When using that method to link
55-
to multiple upstream sources, a middle dataflow block is created that only publishes to your block when
56-
all recieved values are in a consistent state. See [this example](../extensibility/dataflow_example.md#chained-data-source-multiple-sources)
57-
for how to use `SyncLinkTo`.
58-
59-
### Rules to Follow with Versioning
60-
61-
__When you are a...__
62-
* __Original source__ you have your own `DataSourceKey` and `DataSourceVersion`. The key
63-
identifies who you are, and the version must incremenet whenever you produce a new value.
64-
The only value in your `DataSourceVersions` published is your own.
65-
* __Chained source__ you must merge and carry the versions of all dataflow you are chained
66-
to in your own `DataSourceVersions`. You very rarely have your own version because your
67-
version is just the combined versions that you chained to. If you do need your own version,
68-
consider pulling the part that publishes the original data into its own source.
69-
70-
### Allowing Inconsistent Versions
71-
72-
In special cases that require it, it is possible to allow for inconsistent versions in your dataflow.
73-
This is for when you depend on multiple upstream sources where one is drastically slower at producing
74-
values than others, but you want to be able to produce intermediate values while the slow one is still
75-
processing. Unfortunately, there is no CPS base class equivalent to `ProjectValueDataSourceBase` or
76-
`ChainedProjectValueDataSourceBase` for this scenario. You will have to manually link to your upstream
77-
sources and synchronizing between multiple sources publishing at once. For calculating the data versions
78-
to publish, use `ProjectDataSources.MergeDataSourceVersions`.
79-
80-
## Further reading
81-
82-
- [Dataflow Examples](../extensibility/dataflow_example.md)
83-
- [Dataflow Sources](../extensibility/dataflow_sources.md)
84-
- [Dataflow Best Practices](../extensibility/dataflow_best_practices.md)
3+
Moved to [Dataflow in CPS](dataflow#dataflow-in-cps).

0 commit comments

Comments
 (0)