Step-by-Step Guide to Use R and Selenium to Scrape Empleos Publicos

Using R, selenium and purrr to organize hundreds of HTML sections into one table.
Author

Mauricio “Pachá” Vargas S.

Published

August 20, 2025

Because of delays with my scholarship payment, if this post is useful to you I kindly ask a minimal donation on Buy Me a Coffee. It shall be used to continue my Open Source efforts. The full explanation is here: A Personal Message from an Open Source Contributor.

Motivation

My friend Nicolas Didier asked me about reading Empleos Publicos with R or Python. Here is a short example for him and anybody who may benefit from reading this.

The following steps were adapted from a tutorial I taught at the University of Michigan (GO BLUE!) in 2023.

Required R packages

  • RSelenium: R-Selenium integration
  • rvest: HTML processing
  • dplyr: to load the pipe operator (can be used later for data cleaning)
  • purrr: iteration (i.e., repeated operations)

I installed RSelenium from the R console:

if (!require(RSelenium)) install.packages("RSelenium")

# or

remotes::install_github("ropensci/RSelenium")

For the rest of the packages:

if (!require(rvest)) install.packages("rvest")
if (!require(dplyr)) install.packages("dplyr")
if (!require(purrr)) install.packages("purrr")

Installing Selenium and Chrome/Chromium

Note for Ubuntu/Debian users: We need to check that chrome or chromium is installed in our system. One of the many options is to use the bash console.

sudo add-apt-repository ppa:savoury1/chromium
sudo apt update
sudo apt install chromium-browser
sudo apt install chromium-chromedriver

Not using the PPA will install the snap version of Chromium, which is not compatible with Selenium.

I tried to start Selenium as it is mentioned in the official guide and it did not work.

I had to install Chromium. I am on Manjaro and I ran sudo pacman -S chromium. Windows/Mac users can use Google Chrome.

An extra requirement was to download Selenium Server. Based on this, I started by creating a directory to store the data for this post by typing this in VS Code terminal:

mkdir -p /tmp/didier-example
cd /tmp/didier-example

Then I opened R witn R and downloaded the JAR file:

url_jar <- "https://github.com/SeleniumHQ/selenium/releases/download/selenium-3.9.1/selenium-server-standalone-3.9.1.jar"
sel_jar <- "selenium-server-standalone-3.9.1.jar"

if (!file.exists(sel_jar)) {
  download.file(url_jar, sel_jar)
}

I had to run Selenium from a new terminal:

cd /tmp/didier-example
java -jar selenium-server-standalone-3.9.1.jar

Back to the R terminal, I was finally in condition to control the browser from R:

library(RSelenium)
library(rvest)
library(dplyr)
library(purrr)

rmDr <- remoteDriver(port = 4444L, browserName = "chrome")

rmDr$open(silent = TRUE)

url <- "https://www.empleospublicos.cl"

rmDr$navigate(url)

This should display a new Chrome/Chromium window that says “Chrome is being controlled by automated test software”.

Scraping the data

Using the browser’s inspector (ctrl + shift + i), I explored the page to see that the search bar corresponds to:

<input class="buscador-principal search form-control buscador-movil" name="q" type="search" autocomplete="off" placeholder="Ingresa el cargo, comuna o institución" id="buscadorprincipal">

For example, I can search for “Ministerio de Salud” because there were many posts by that organization on the landing page:

search_box <- rmDr$findElement(using = "id", value = "buscadorprincipal")
search_box$sendKeysToElement(list("Ministerio de Salud", key = "enter"))

That typed “Ministerio de Salud” and clicked search on my behalf. Inspecting the results I see that each job offer starts with

<div class="items col-md-4 col-lg-4 postulacion ...

The first offer listed is this:

<div class="items col-md-4 col-lg-4 postulacion otro otro eepp region7renta3calidad2 busqueda "><div class="item"><div class="top"><div class="label label-estado"><i class="fa fa-circle circulo-status1" aria-hidden="true"></i> Postulación hasta 30/09/2025 23:59:00</div><h3><a target="_blank" href="https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo" onclick="ga('send', 'event', 'convocatorias', 'Medico (a) especialista en Anestesiología 44 horas | Servicio de Salud Maule / Hospital de Constitución', 'eepp');">Medico (a) especialista en Anestesiología 44 horas</a></h3><p>Servicio de Salud Maule / Hospital de Constitución</p></div><hr><div class="cnt"><p>Ministerio de Salud</p><p>Constitución</p><br><div class="alert alert-primer"><i class="fa fa-address-card" aria-hidden="true"></i>  No pide experiencia</div><div class="row card-footer"><div class="col-xs-9 col-md-8 text-left"><a class="cronograma btn " url="https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo" onclick="return false;" href="#" title="Ver Cronograma de la Convocatoria"><i class="fa fa-calendar-days"></i> Calendarización</a>
        <div class="compartir-social">      
            <div class="row">
                <div class="col-xs-3 col-md-4">
                    <a class="btn" onclick="enviarRS('t', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" target="_blank" title="Compartir en Twitter"><i class="fa-brands fa-square-x-twitter fa-xl" aria-hidden="true"></i></a>
                </div>
                <div class="col-xs-3 col-md-4">
                    <a class="btn" onclick="enviarRS('f', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" target="_blank" title="Compartir en Facebook"><i class="fa-brands fa-square-facebook fa-xl" aria-hidden="true"></i></a>
                </div>
                <div class="col-xs-3 col-md-4">
                    <a class="btn" onclick="enviarRS('l', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" target="_blank" title="Compartir en Linkedin"><i class="fa-brands fa-linkedin fa-xl" aria-hidden="true"></i></a>
                </div>
                <div class="col-xs-3 col-md-4">
                    <a class="btn whatsapp-link visible-xs visible-sm" title="Compartir en Whatsapp" onclick="enviarRS('w', 'https://www.empleospublicos.cl/pub/convocatorias/convpostularavisoTrabajo.aspx?i=130648&amp;c=0&amp;j=0&amp;tipo=convpostularavisoTrabajo', 'Medico (a) especialista en Anestesiología 44 horas Servicio de Salud Maule / Hospital de Constitución'); return false;" href="#" data-action="share/whatsapp/share"><i class="fa-brands fa-square-whatsapp fa-xl" aria-hidden="true"></i></a>
                </div>
            </div>
        </div>
    <div class="row"><div class="col-md-12 card-footer-contenido "></div></div></div></div></div></div></div>
html <- read_html(rmDr$getPageSource()[[1]])

offers <- html %>%
  html_nodes("div.items")

offers_tbl <- map_df(offers, function(offer) {
  # Extract position (job title)
  position <- offer %>%
    html_node("h3 a") %>%
    html_text(trim = TRUE)
  
  # Extract organization (usually the first <p> inside .top)
  organization <- offer %>%
    html_node(".top p") %>%
    html_text(trim = TRUE)
  
  # Extract city (the second <p> inside .cnt)
  city <- offer %>%
    html_nodes(".cnt p") %>%
    .[2] %>%
    html_text(trim = TRUE)
  
  tibble(
    position = position,
    organization = organization,
    city = city
  )
})

The result has the following structure:

offers_tbl
# A tibble: 552 × 3
   position                                                   organization city 
   <chr>                                                      <chr>        <chr>
 1 Medico (a) especialista en Anestesiología 44 horas         Servicio de… Cons…
 2 Titulares de la Planta Profesional Ley 18.834              Servicio de… Valp…
 3 ENFERMERA-O, JORNADA DIURNA, GRADO 12, PARA SERVICIO CLÍN… Servicio de… Reco…
 4 Psiquiatra infanto-juvenil sistema de atención intersecto… Servicio de… La P…
 5 Neurólogo(a) adulto GES Alzheimer y otras demencias        Servicio de… Puen…
 6 Médico(a) especialista en Neurología Infantil Hospital de… Servicio de… Cast…
 7 Arquitecto de Software                                     Central de … Ñuñoa
 8 TENS OPERADOR DE EQUIPOS DE ESTERILIZACIÓN                 Servicio de… Peña…
 9 (850-2892) Médico Especialista Broncopulmonar o Internist… Servicio de… Talc…
10 Enfermero(a) Clínico(a) Atención Abierta y Cerrada         Servicio de… Huas…
glimpse(offers_tbl)
> glimpse(offers_tbl)
Rows: 552
Columns: 3
$ position     <chr> "Medico (a) especialista en Anestesiología 44 horas", "Ti…
$ organization <chr> "Servicio de Salud Maule / Hospital de Constitución", "Se…
$ city         <chr> "Constitución", "Valparaíso", "Recoleta", "La Pintana", "…

I know this is a simple example but should allow different kinds of exploration and data extraction. I hope it helps.