Migrate Confluence XML export to MediaWiki import data

This is a command line tool to convert the contents of a Confluence space into a MediaWiki import data format. See also the official BlueSpice Helpdesk entry.

Docker

The migrate confluence tool is available as docker image.

Workflow

Export "space" from Confluence

Create an export of your confluence space (one export xml for each space).

Step 1:

Step 2:

Step 3:

Save it to a location that is accessbile by this tool (e.g. /tmp/confluence/input/Confluence-export.zip)
Create the input directory (e.g. /tmp/confluence/input)
Extract the ZIP file (e.g. /tmp/confluence/input/Confluence-export)
1. The folder should contain the files entities.xml and exportDescriptor.properties, as well as the folder attachments

Migrate the contents

Create the "workspace" directory (e.g. /tmp/confluence/workspace/ )
From the parent directory (e.g. /tmp/ ), run the migration commands
1. Run docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest analyze --src=/data/input --dest=/data/workspace to create "working files". After the script has run you can check those files and maybe apply changes if required (e.g. when applying structural changes).
2. Run docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest extract --src=/data/input --dest=/data/workspace to extract all contents, like wikipage contents, attachments and images into the workspace
3. Check database tables logging, page_invalid_titles, blog_post_invalid_titles, page_template_invalid_titles and attachment_invalid_titles. Modifiy titles if necessary.
4. Run docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest convert --src=/data/workspace --dest=/data/workspace (yes, --src /data/workspace/ ) to convert the wikipage contents from Confluence Storage XML to MediaWiki WikiText. For large spaces, see Parallel convert below.
5. Check database tables logging, body_contents, page_template_contents
6. Run docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest compose --src=/data/workspace --dest=/data/workspace (yes, --src /data/workspace/ ) to create importable data
7. Check the log files in workspace directory for errors, especially the skipped_pages.log. Pages logged in this file are not part of the mediawiki import data.

Important: If you re-run the scripts you will need to clean up the "workspace" directory!

Import into MediaWiki

Copy the diretory "workspace/result" directory (e.g. /tmp/confluence/workspace/result/) to your target wiki server (e.g. /tmp/result)
Go to your MediaWiki installation directory
Make sure you have the target namespaces set up properly. See workspace/space-id-to-prefix-map.php for reference.
Make sure $wgFileExtensions is setup properly. See workspace/deployment.log for reference.
Use php extensions/BlueSpiceDistributionConnector/maintenance/importFiles.php --src=/tmp/result/files.xml to first import all attachment files and images
Use php maintenance/importDump.php /tmp/result/pages.xml to import the actual pages. Use the same command to import blog.xml, comments.xml and templates.xml, but not user.xml. This file can not be imported and is just for making user data available.

You may need to run php maintenance/rebuildAll.php and update your MediaWiki search index afterwards.

Config file

It is possible to use a yaml file to configure the commands. As an example see /doc/config.sample.yaml. The configuration file can be applied by adding the option --config /data/config.yaml.

Not all parameters of config.sample.yaml have to be used in the config file. If something is not part of it the default will be used.

Parallel convert

For large Confluence spaces the convert step can be slow. You can speed it up by running multiple worker processes in parallel using the --workers option.

docker run -v $(pwd)/confluence:/data bluespice/migrate-confluence:latest convert \
  --src=/data/workspace --dest=/data/workspace \
  --workers=4

The command spawns the requested number of child processes automatically. Each worker handles a disjoint slice of the file list, so every file is converted exactly once. Progress lines are prefixed with [Worker N] so you can follow each process individually. If any worker fails the command exits with a non-zero status and reports which workers were affected.

Choose --workers based on the number of available CPU cores. A value between 2 and 8 is typical; there is no benefit in exceeding the number of cores on your machine.

Note: --workers=1 (the default) behaves identically to running without the option — no child processes are spawned.

Extension:NSFileRepo compatibility

The migrate-confluence tool supports compatibility for the mediawiki extension https://www.mediawiki.org/wiki/Extension:NSFileRepo which restricts access files and images to a given set of user groups associated with protected namespaces.

User spaces

In confluence user spaces are protected. In MediaWiki this is not possible for namespace User. Therefore user spaces are migrated to a namespace User<username> which can be protected in BlueSpice for MediaWiki.

Included MediaWiki wikitext templates

AttachmentsSectionEnd
AttachmentsSectionStart
Details
DetailsSummary
Excerpt
ExcerptInclude
Info
InlineComment
Note
Panel
RecentlyUpdated
SubpageList
SubpageListRow
Tip
Warning
PageTree
SpaceDetails
ViewFile
(and more)

Be aware that those pages may be overwritten by the import if they already exist in the target wiki.

Included upload files

Icon-info.svg
Icon-note.svg
Icon-tip.svg
Icon-warning.svg

Be aware that those files may be overwritten by the import if they already exist in the target wiki.

MediaWiki settings

In case your pages contain a lot of external images (<img /> elements), be aware that MediaWiki does not show them by default. You'd need to configure $wgAllowExternalImages. Read https://www.mediawiki.org/wiki/Manual:$wgAllowExternalImages for more information.

Jira interwiki links

Confluence pages that contain Jira macros are converted to use MediaWiki interwiki links. Two separate prefixes are used because Jira issue keys and JQL queries have different URL patterns:

Interwiki prefix	Purpose	Example URL pattern
`jira`	Link to a specific Jira issue by key	`https://jira.example.com/browse/$1`
`jira-jql`	Link to a Jira issue list filtered by JQL	`https://jira.example.com/issues/?jql=$1`

Add both entries to the interwiki table of your MediaWiki database, or configure them via $wgExtraInterlanguageLinkPrefixes and the interwiki cache. Replace https://jira.example.com with the base URL of your Jira instance.

File revisions

The tool has experimental support for file revisions. Enable them with the config option

    # enable BETA support for file revisions
    include-history: true

There is a good chances for problems and edge-cases, though. Take care to validate the output.

Required MediaWiki extensions

The output generated by the tool contains certain elements that need additonal extensions to be enabled.

Recommended MediaWiki extensions

These extensions are not strictly required but are recommended for full compatibility with the migrated content.

WikiMarkdown - Renders <markdown> tags produced from Confluence markdown macros

Manual post-import maintenance

Cleanup Categories

In the case that the tool can not migrate content or functionality it will create a category, so you can manually fix issues after the import

Broken_link
Broken_user_link
Broken_page_link
Broken_image
Broken_layout
Broken_macro/<macro-name>

Not migrated

User identities
Various macros
Various layouts
Files of a space which can not be assigned to a page

TODO

Reduce multiple linebreaks ( ) to one
Remove line breaks and arbitrary fromatting (e.g. ) from headings
Mask external images (<img />)
Merge multiple <code> lines into <pre>
Remove bold/italic formatting from wikitext headings (e.g. === '''Some heading''' ===)
Fix unconverted HTML lists in wikitext (e.g. <ul><li>==== Lorem ipsum ====</li><li>''' [[Media:Some_file.pdf]]'''</li></ul><ul>)
Remove empty confluence storage format fragments (e.g.  , )

Name		Name	Last commit message	Last commit date
Latest commit History 390 Commits
.github/workflows		.github/workflows
bin		bin
dist		dist
doc		doc
docker/php		docker/php
src		src
tests/phpunit		tests/phpunit
.editorconfig		.editorconfig
.gitignore		.gitignore
.phpcs.xml		.phpcs.xml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
VERSION		VERSION
box.json		box.json
composer.json		composer.json
phpunit.xml		phpunit.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Migrate Confluence XML export to MediaWiki import data

Docker

Workflow

Export "space" from Confluence

Migrate the contents

Import into MediaWiki

Config file

Parallel convert

Extension:NSFileRepo compatibility

User spaces

Included MediaWiki wikitext templates

Included upload files

MediaWiki settings

Jira interwiki links

File revisions

Required MediaWiki extensions

Recommended MediaWiki extensions

Manual post-import maintenance

Cleanup Categories

Not migrated

TODO

About

Uh oh!

Releases 21

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Migrate Confluence XML export to MediaWiki import data

Docker

Workflow

Export "space" from Confluence

Migrate the contents

Import into MediaWiki

Config file

Parallel convert

Extension:NSFileRepo compatibility

User spaces

Included MediaWiki wikitext templates

Included upload files

MediaWiki settings

Jira interwiki links

File revisions

Required MediaWiki extensions

Recommended MediaWiki extensions

Manual post-import maintenance

Cleanup Categories

Not migrated

TODO

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 21

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages