Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions dayo-web-scraping/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# ex: FROM k8scc01covidacr.azurecr.io/minimal-notebook-cpu:5ef877ea13789f64594c219ef0a302dc97c21bb4
ARG BASE_CONTAINER
FROM $BASE_CONTAINER
USER root

RUN apt-get update && apt-get install -y software-properties-common --no-install-recommends \
&& apt-get install -y chromium-browser chromium-browser-l10n chromium-codecs-ffmpeg \
&& ln -s /usr/bin/chromium-browser /usr/bin/google-chrome \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN wget -q https://chromedriver.storage.googleapis.com/85.0.4183.87/chromedriver_linux64.zip && \
unzip chromedriver_linux64.zip && \
rm chromedriver_linux64.zip && \
chmod a+x chromedriver && \
mv chromedriver /usr/bin/ && \
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - && \
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list

RUN sudo apt-get update && \
sudo apt-get -y install google-chrome-stable && \
sudo apt-get clean

RUN pip install --no-cache-dir 'selenium==3.141.0' && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER

RUN pip install --upgrade pip \
--no-cache-dir 'playwright==1.19.1' && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER

RUN sudo apt-get update && \
sudo playwright install-deps

# Configure container startup
EXPOSE 8888
USER jovyan
ENTRYPOINT ["tini", "--"]
CMD ["start-custom.sh"]
5 changes: 5 additions & 0 deletions dayo-web-scraping/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Summary

Custom Jupyter server built with Chrome, playwright, and selenium for web scraping. Extends a pinned version of the `minimal-notebook-cpu` image.


42 changes: 42 additions & 0 deletions yoon-minimal-web-scraping/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# ex: FROM k8scc01covidacr.azurecr.io/minimal-notebook-cpu:5ef877ea13789f64594c219ef0a302dc97c21bb4
ARG BASE_CONTAINER
FROM $BASE_CONTAINER
USER root

RUN apt-get update && apt-get install -y software-properties-common --no-install-recommends \
&& apt-get install -y chromium-browser chromium-browser-l10n chromium-codecs-ffmpeg \
&& ln -s /usr/bin/chromium-browser /usr/bin/google-chrome \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

RUN wget -q https://chromedriver.storage.googleapis.com/85.0.4183.87/chromedriver_linux64.zip && \
unzip chromedriver_linux64.zip && \
rm chromedriver_linux64.zip && \
chmod a+x chromedriver && \
mv chromedriver /usr/bin/ && \
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - && \
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list

RUN sudo apt-get update && \
sudo apt-get -y install google-chrome-stable && \
sudo apt-get clean

RUN pip install --no-cache-dir 'selenium==3.141.0' && \
fix-permissions $CONDA_DIR && \
fix-permissions /home/$NB_USER

# Configure container startup
EXPOSE 8888
USER jovyan
ENTRYPOINT ["tini", "--"]
CMD ["start-custom.sh"]

# # To test in python:
# from selenium import webdriver
# from selenium.webdriver.chrome.options import Options

# chrome_options = Options()
# chrome_options.add_argument('--headless')
# chrome_options.add_argument('--no-sandbox')
# d = webdriver.Chrome(chrome_options=chrome_options)
# d.get("https://www.google.com")
23 changes: 23 additions & 0 deletions yoon-minimal-web-scraping/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Summary

Custom Jupyter server built with Chrome and selenium for web scraping. Extends a pinned version of the `minimal-notebook-cpu` image.

# Existing versions

Paste these into the custom notebook image in the `New Server` page to use them

* k8scc01covidacr.azurecr.io/yoon-minimal-web-scraping:2020-09-17_1

# Build/Update Instructions

(must have permission to push to k8scc01covidacr)

```
# Edit build.sh to set image VERSION
# Edit build.sh to pin to the desired minimal-nobook-cpu image

az acr login --name k8scc01covidacr

./build.sh
# Add to Existing versions above if sharing with others
```
10 changes: 10 additions & 0 deletions yoon-minimal-web-scraping/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#!/usr/bin/env bash
# VERSION can be anything, just don't overwrite previous items
VERSION="YYYY-MM-DD_VERSIONNUMBER"
IMAGE_TAG="k8scc01covidacr.azurecr.io/yoon-minimal-web-scraping:$VERSION"
BASE_CONTAINER="k8scc01covidacr.azurecr.io/minimal-notebook-cpu:5ef877ea13789f64594c219ef0a302dc97c21bb4"
docker build -t $IMAGE_TAG --build-arg BASE_CONTAINER=$BASE_CONTAINER .
# docker run -p 8888:8888 $IMAGE_TAG

# Must be logged into az acr (az acr login --name k8scc01covidacr)
docker push $IMAGE_TAG