How to recover data from disconnected ppt charts

2015, May 01    

How to recover data from disconnected ppt charts

For those who handle lots of powerpoint files at work, few things are more annoying than ppt charts that have lost links with their original data set. Luckily you can still figure out the original data by hovering over the points, but what would you do with a chart that has hundreds, if not thousands of data points?

A few days ago I was in that situation where I had a disconnected grouped bar chart. It had 4 series times 7 categories, so 28 data points. This was enough to irritate me, and I found this link.

This guy’s blog explains the solution well, but I thought I could do something with R. So here’s what I’ve come up with. If you have R installed in your mac, you can easily do it.

1. let’s get the ingredients ready.


Here I have a ppt file with 4 different types of charts; a line chart, a pie chart, a bar chart and a group bar chart.

2. save the ppt file, and change it file type from pptx to zip.


It sounds pretty weird, but your computer will allow that to happen.

3. unzip the newly modified zip file.


In the folder, you’ll see a bunch of folders and files.. and among them, you’ll find [charts] folder inside [ppt] folder. That’s where our charts are. 4 xml files there!

4. put them in a folder with ppt_chart_recovery_f.R (again, you can download it from here). Make sure you have no other xml files in the folder, as my R script is designed to parse all the xml files it can find in its folder. Like this.


5. open terminal and go to the test folder. as my [test] folder is in the [desktop] folder, I just need to type “cd desktop/test"

6. run the r code to get the result. type “r” to activate r, and type “source(“ppt_chart_recovery_f.R”). Within a few seconds, the job is done.


It shows some messages that go “Namespace prefix c…” but you can ignore it.

7. check out the results.


Voila! four new csv files! click and hit space to check out what they look like.


Job done!


#loading files
current_files <- dir()
file_vec <- vector()
for (i in 1:length(current_files)) {
  file <- current_files[i]
  if (grepl(".xml", file)) {
    if (!(file %in% file_vec)) {
      file_vec <- c(file_vec, file)
ser_list <- list()
for (file in 1:length(file_vec)) {
  doc <- xmlTreeParse(file_vec[file])
  root <- xmlRoot(doc)
  #chart type
  chart_type <- names(doc$doc$children$chartSpace["chart"]$chart["plotArea"]$plotArea[2])
  #fetching series
  ser = xpathApply(root, "//c:ser/c:tx/c:strRef/c:strCache/c:pt/c:v")
  ser_vec = vector()
  for (i in 1:length(ser)) {
    ser_item <- toString.XMLNode(ser[[i]])
    ser_item <- gsub("<.*?>","", ser_item)
    ser_item <- gsub(" $","", ser_item, perl=T)
    ser_vec <- c(ser_vec, ser_item)
  #fetching categories
  cat0 <- xpathApply(root, "//c:cat")
  cat_val <- names(cat0[[1]])
  if (cat_val == "c:strRef") {
    cat <- xpathApply(root, "//c:cat/c:strRef/c:strCache/c:pt/c:v")
  } else {
    cat <- xpathApply(root, "//c:cat/c:numRef/c:numCache/c:pt/c:v") 
  cat_vec = vector()
  for (i in 1:length(cat)) {
    cat_item <- toString.XMLNode(cat[[i]])
    cat_item <- gsub("<.*?>","", cat_item)
    cat_item <- gsub(" $","", cat_item, perl=T)
    if (!(cat_item %in% cat_vec)) {
      cat_vec <- c(cat_vec, cat_item)
  #fetching values
  value = xpathApply(root, "//c:ser/c:val/c:numRef/c:numCache/c:pt/c:v")
  val_vec = vector()
  for (i in 1:length(value)) {
    value_item <- toString.XMLNode(value[[i]])
    value_item <- gsub("<.*?>","", value_item)
    value_item <- gsub(" $","", value_item, perl=T)
    value_item <- as.numeric(value_item)
    val_vec <- c(val_vec, value_item)
  #creating matrix
  final <- matrix(val_vec,nrow=length(cat_vec),ncol=length(ser_vec))
  rownames(final) <- cat_vec
  colnames(final) <- ser_vec
  #export as a csv file
  filename <- paste("done_", "chart", file, sep="")
  filename <- paste(filename, ".csv", sep="")
  write.csv(final, filename)