Confluence REST API

ussc provides five functions that will be useful if/when you need to download data from Confluence:

ussc_confluence_table() to grab any table and return in the same format as on Confluence,
ussc_confluence_excel() for downloading Excel files and loading into R,
ussc_confluence_kpi_table() for grabbing the KPIs and cleaning the data so it is long and tidy,
ussc_confluence_version_history() to grab the version history of any table on Confluence (particularly useful for updates to the Publications calendar)
ussc_confluence_word_tables() to download tables within Word documents, load them into R and clean them.

To use these functions, you need to connect to the Confluence REST API. Grab an API key from https://confluence.atlassian.com/cloud/api-tokens-938839638.html. You can add the API key into your .renviron file by running usethis::edit_r_environ(). The confluence keys look like this in my .renviron file:

CONFLUENCE_URL = https://usscsydney.atlassian.net/wiki

CONFLUENCE_USERNAME = firstname.lastname@sydney.edu.au (add your own email)

CONFLUENCE_PASSWORD = KFkuYkNdkqSKEWfdrjODxxxx (please change your API key – note that this API key does not work as I changed the last four digits)

Save the .renviron file and restart R so the changes take effect.

Scrape HTML tables from Confluence

ussc_confluence_table() returns HTML tables that have been embedded in Confluence as tidy tibbles. There are three parameters - the ID of the Confluence page, your username and your password (API key). The username and password default to CONFLUENCE_USERNAME and CONFLUENCE_PASSWORD in your .renviron file. You can add the values manually if you wish.

It is a fairly simple function to run. Say I want to grab the Key Performance Indicators from the Comms team. The page ID is 950239240.

comms_kpi <-  ussc_confluence_table("950239240")

This grabs all of the tables on the page. You can extract each table by running:

online_kpi_table1 <- comms_kpi[1][[1]]

head(online_kpi_table1)

FALSE                     x this_month v_last_month_percent v_12_mth_avg_percent
FALSE 1            Sessions     10,010                  8.5                -15.1
FALSE 2               Users      7,894                  9.1                 -2.9
FALSE 3           Pageviews     18,470                 11.3                -18.7
FALSE 4       Pages/session       1.85                  2.8                 -4.1
FALSE 5 Avg duration (secs)         93                  0.0                -13.1
FALSE 6       Bounce rate %      71.46                 -0.5                  3.6

Publications KPI table

If you want to grab the publication metrics data, run ussc_confluence_kpi_table(). This will download the data and clean it. Let’s take a look at the 2020 Q1 metrics.

pub_kpi <- ussc_confluence_kpi_table("1173357736")

head(pub_kpi)

FALSE    publication_date
FALSE 1 20 September 2019
FALSE 2 20 September 2019
FALSE 3 20 September 2019
FALSE 4 20 September 2019
FALSE 5 20 September 2019
FALSE 6 20 September 2019
FALSE                                                        report_title report_type
FALSE 1 Allied interest: Australia-US relations remain strong in new poll        Poll
FALSE 2 Allied interest: Australia-US relations remain strong in new poll        Poll
FALSE 3 Allied interest: Australia-US relations remain strong in new poll        Poll
FALSE 4 Allied interest: Australia-US relations remain strong in new poll        Poll
FALSE 5 Allied interest: Australia-US relations remain strong in new poll        Poll
FALSE 6 Allied interest: Australia-US relations remain strong in new poll        Poll
FALSE                     report_authors tracking_date            metric value
FALSE 1 Simon Jackman and Shaun Ratcliff        apr 20        page views   872
FALSE 2 Simon Jackman and Shaun Ratcliff        apr 20 unique page views   784
FALSE 3 Simon Jackman and Shaun Ratcliff        apr 20          avg time  3:34
FALSE 4 Simon Jackman and Shaun Ratcliff        feb 20        page views   370
FALSE 5 Simon Jackman and Shaun Ratcliff        feb 20 unique page views   315
FALSE 6 Simon Jackman and Shaun Ratcliff        feb 20          avg time  3:05

Version history

To see the version history of a table, run the function ussc_confluence_version_history(). You will be able to see when edits to the page were made and by whom.

pubs_calendar_version_history <- ussc_confluence_version_history("1020526593")

FALSE Please note: depending on how many edits have been made to the page, scraping the version history will take a while (~2 - 4 mins)... 
FALSE

head(pubs_calendar_version_history)

FALSE # A tibble: 6 x 13
FALSE   date       version_history message email name  title program status
FALSE   <date>               <dbl> <chr>   <chr> <chr> <chr> <chr>   <chr> 
FALSE 1 2019-10-25               2 ""      drew… Drew… AgTe… "I&E"   Layou…
FALSE 2 2019-10-25               1 ""      drew… Drew… AgTe… "I&E"   Publi…
FALSE 3 2019-11-22               3 ""      susa… Susa… AgTe… "I&E"   Layou…
FALSE 4 2020-01-21               6 ""      step… Step… AgTe… "I&E"   Layou…
FALSE 5 2020-01-22               7 ""      mati… Mati… AgTe… "I&E"   Relea…
FALSE 6 2019-10-25               1 ""      drew… Drew… Are … ""      In pr…
FALSE # … with 5 more variables: original_publication_date <chr>,
FALSE #   new_publication_date <chr>, author <chr>, approver <chr>, type <chr>

Download Excel files from Confluence

Another function, ussc_confluence_excel(), downloads all Excel files on a given page. Note: this function might take a while to run if you need to download many files. I have only tested this function on Firefox. It also assumes that you have set your settings so that the files download automatically.

one_file <- ussc_confluence_excel(id = "950239621")

The Excel file has 4 sheets. We can extract the sheets below.

publications <- one_file[["Publications"]]
media <- one_file[["Media"]]
events <- one_file[["Events"]]
meetings_etc <- one_file[["Meetings and other"]]


head(publications)

FALSE # A tibble: 6 x 9
FALSE   publication_type author_s title source location date                vertical
FALSE   <chr>            <chr>    <chr> <chr>  <chr>    <dttm>              <chr>   
FALSE 1 2018             <NA>      <NA> <NA>   <NA>     NA                  <NA>    
FALSE 2 Debate Paper     Stephen… "Wil… USSC   Austral… 2018-09-03 00:00:00 T&I     
FALSE 3 Explainer        Matilda… "Wom… USSC   Austral… 2018-09-19 00:00:00 FP&D_co…
FALSE 4 Journal article  Stephen… "Mon… Econo… Austral… 2018-09-27 00:00:00 T&I     
FALSE 5 Brief            Charles… "Ame… USSC   Austral… 2018-10-04 00:00:00 FP&D    
FALSE 6 Report           Spencer… "Aus… USSC   Austral… 2018-10-09 00:00:00 I&E     
FALSE # … with 2 more variables: topic <chr>, feedback <chr>

USSC Confluence functions

Zoe Meers

19/03/2019

Confluence REST API

Scrape HTML tables from Confluence

Publications KPI table

Version history

Download Excel files from Confluence