vignettes/ussc_confluence.Rmd
ussc_confluence.Rmd
ussc
provides five functions that will be useful if/when you need to download data from Confluence:
ussc_confluence_table()
to grab any table and return in the same format as on Confluence,ussc_confluence_excel()
for downloading Excel files and loading into R
,ussc_confluence_kpi_table()
for grabbing the KPIs and cleaning the data so it is long and tidy,ussc_confluence_version_history()
to grab the version history of any table on Confluence (particularly useful for updates to the Publications calendar)ussc_confluence_word_tables()
to download tables within Word documents, load them into R
and clean them.To use these functions, you need to connect to the Confluence REST API. Grab an API key from https://confluence.atlassian.com/cloud/api-tokens-938839638.html. You can add the API key into your .renviron file by running usethis::edit_r_environ()
. The confluence keys look like this in my .renviron file:
CONFLUENCE_URL = https://usscsydney.atlassian.net/wiki
CONFLUENCE_USERNAME = firstname.lastname@sydney.edu.au (add your own email)
CONFLUENCE_PASSWORD = KFkuYkNdkqSKEWfdrjODxxxx (please change your API key – note that this API key does not work as I changed the last four digits)
Save the .renviron file and restart R
so the changes take effect.
ussc_confluence_table()
returns HTML tables that have been embedded in Confluence as tidy tibbles. There are three parameters - the ID of the Confluence page, your username and your password (API key). The username and password default to CONFLUENCE_USERNAME and CONFLUENCE_PASSWORD in your .renviron file. You can add the values manually if you wish.
It is a fairly simple function to run. Say I want to grab the Key Performance Indicators from the Comms team. The page ID is 950239240.
comms_kpi <- ussc_confluence_table("950239240")
This grabs all of the tables on the page. You can extract each table by running:
online_kpi_table1 <- comms_kpi[1][[1]] head(online_kpi_table1)
FALSE x this_month v_last_month_percent v_12_mth_avg_percent
FALSE 1 Sessions 10,010 8.5 -15.1
FALSE 2 Users 7,894 9.1 -2.9
FALSE 3 Pageviews 18,470 11.3 -18.7
FALSE 4 Pages/session 1.85 2.8 -4.1
FALSE 5 Avg duration (secs) 93 0.0 -13.1
FALSE 6 Bounce rate % 71.46 -0.5 3.6
If you want to grab the publication metrics data, run ussc_confluence_kpi_table()
. This will download the data and clean it. Let’s take a look at the 2020 Q1 metrics.
pub_kpi <- ussc_confluence_kpi_table("1173357736") head(pub_kpi)
FALSE publication_date
FALSE 1 20 September 2019
FALSE 2 20 September 2019
FALSE 3 20 September 2019
FALSE 4 20 September 2019
FALSE 5 20 September 2019
FALSE 6 20 September 2019
FALSE report_title report_type
FALSE 1 Allied interest: Australia-US relations remain strong in new poll Poll
FALSE 2 Allied interest: Australia-US relations remain strong in new poll Poll
FALSE 3 Allied interest: Australia-US relations remain strong in new poll Poll
FALSE 4 Allied interest: Australia-US relations remain strong in new poll Poll
FALSE 5 Allied interest: Australia-US relations remain strong in new poll Poll
FALSE 6 Allied interest: Australia-US relations remain strong in new poll Poll
FALSE report_authors tracking_date metric value
FALSE 1 Simon Jackman and Shaun Ratcliff apr 20 page views 872
FALSE 2 Simon Jackman and Shaun Ratcliff apr 20 unique page views 784
FALSE 3 Simon Jackman and Shaun Ratcliff apr 20 avg time 3:34
FALSE 4 Simon Jackman and Shaun Ratcliff feb 20 page views 370
FALSE 5 Simon Jackman and Shaun Ratcliff feb 20 unique page views 315
FALSE 6 Simon Jackman and Shaun Ratcliff feb 20 avg time 3:05
To see the version history of a table, run the function ussc_confluence_version_history()
. You will be able to see when edits to the page were made and by whom.
pubs_calendar_version_history <- ussc_confluence_version_history("1020526593")
FALSE Please note: depending on how many edits have been made to the page, scraping the version history will take a while (~2 - 4 mins)...
FALSE
head(pubs_calendar_version_history)
FALSE # A tibble: 6 x 13
FALSE date version_history message email name title program status
FALSE <date> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
FALSE 1 2019-10-25 2 "" drew… Drew… AgTe… "I&E" Layou…
FALSE 2 2019-10-25 1 "" drew… Drew… AgTe… "I&E" Publi…
FALSE 3 2019-11-22 3 "" susa… Susa… AgTe… "I&E" Layou…
FALSE 4 2020-01-21 6 "" step… Step… AgTe… "I&E" Layou…
FALSE 5 2020-01-22 7 "" mati… Mati… AgTe… "I&E" Relea…
FALSE 6 2019-10-25 1 "" drew… Drew… Are … "" In pr…
FALSE # … with 5 more variables: original_publication_date <chr>,
FALSE # new_publication_date <chr>, author <chr>, approver <chr>, type <chr>
Another function, ussc_confluence_excel()
, downloads all Excel files on a given page. Note: this function might take a while to run if you need to download many files. I have only tested this function on Firefox. It also assumes that you have set your settings so that the files download automatically.
one_file <- ussc_confluence_excel(id = "950239621")
The Excel file has 4 sheets. We can extract the sheets below.
publications <- one_file[["Publications"]] media <- one_file[["Media"]] events <- one_file[["Events"]] meetings_etc <- one_file[["Meetings and other"]] head(publications)
FALSE # A tibble: 6 x 9
FALSE publication_type author_s title source location date vertical
FALSE <chr> <chr> <chr> <chr> <chr> <dttm> <chr>
FALSE 1 2018 <NA> <NA> <NA> <NA> NA <NA>
FALSE 2 Debate Paper Stephen… "Wil… USSC Austral… 2018-09-03 00:00:00 T&I
FALSE 3 Explainer Matilda… "Wom… USSC Austral… 2018-09-19 00:00:00 FP&D_co…
FALSE 4 Journal article Stephen… "Mon… Econo… Austral… 2018-09-27 00:00:00 T&I
FALSE 5 Brief Charles… "Ame… USSC Austral… 2018-10-04 00:00:00 FP&D
FALSE 6 Report Spencer… "Aus… USSC Austral… 2018-10-09 00:00:00 I&E
FALSE # … with 2 more variables: topic <chr>, feedback <chr>