Skip to content

Add greenmask for randomized database imports#1502

Open
danlamanna wants to merge 1 commit into
masterfrom
greenmask
Open

Add greenmask for randomized database imports#1502
danlamanna wants to merge 1 commit into
masterfrom
greenmask

Conversation

@danlamanna
Copy link
Copy Markdown
Member

This adds https://docs.greenmask.io/latest as a development utility which makes it easy to get a small subset of real, redacted, production data.

@annehaley can you give dev/greenmask-dump.sh a try locally and let me know how it works? You may want to run the server with export DJANGO_ISIC_FAKE_STORAGE=placeholder since there won't be any real images.

@danlamanna danlamanna marked this pull request as draft June 1, 2026 05:48
@annehaley
Copy link
Copy Markdown
Contributor

I was able to run the script and checked the results in the client. I see this in the images list view:
image

I'm assuming that these blank images are expected for this fake data (though maybe the "Showing 0 results" should be fixed?). Each "image" has associated info and metadata.

"venv": ["uv", "sync", "--all-extras", "--all-groups"],
"npm": ["npm", "install"]
"npm": ["npm", "install"],
"greenmask": "curl -fsSL https://greenmask.io/install.sh | sh -s -- -y -v v0.2.21"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be put into the Dockerfile?

Copy link
Copy Markdown
Member Author

@danlamanna danlamanna Jun 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but it depends on features e.g. heroku and psql. Ugh, no it doesn't. What is the heuristic for what should be in the image vs somewhere else? If there were a greenmask feature I assume we'd use that like we do with others.

@danlamanna
Copy link
Copy Markdown
Member Author

I'm assuming that these blank images are expected for this fake data

Yeah there's 2 options, either export DJANGO_ISIC_FAKE_STORAGE=placeholder to get placeholder images or export DJANGO_ISIC_FAKE_STORAGE=proxy to get the real images at the cost of slower load times. The latter requires auth info I'll send to you offline.

maybe the "Showing 0 results" should be fixed?

We compute most of our counts with elasticsearch so this is unfortunately expected. ./manage.py populate_elasticsearch should fix it.

@annehaley
Copy link
Copy Markdown
Contributor

Thanks @danlamanna for helping me resolve those issues. I can now use the placeholder or proxy mode for images, and running the populate_elasticsearch command worked to fill in that "0 results" label with the real number.

@danlamanna danlamanna force-pushed the greenmask branch 8 times, most recently from 15de9e2 to c390129 Compare June 4, 2026 17:59
@danlamanna danlamanna marked this pull request as ready for review June 4, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants