Written by Tony Gray
Have you ever needed to pull data from multiple SharePoint sites that may be residing on lists into Power BI? How has this gone for you? Depending on your tool/technology of choice, there may be some options you would be thinking?
- Create a manual dataset in your reporting tool to look at each site and then append all the data together?
- Create a complex function in your reporting tool to possibly loop through all the SharePoint sites to pull through the list data?
Funnily enough, you may not even think this is even an issue. I mean it’s just data residing in a few sites you know! Just use your SharePoint connector in your reporting tool and connect to those sites! Well…, yes! This would work well, however what happens if the data resides in hundreds of sites?
This blog will cover how to obtain data from multiple SharePoint sites and then centralise all that data into one place without the need for additional databases or other Azure infrastructure. All that is required is your SharePoint sites and Power Automate.
The illustration below shows a site collection that may have many sub-sites created within the site collection; at the top level we have either a simple SharePoint list / document library which will store the data that is collected from all the sites. The Power automate process does the following high-level steps:
- Queries the site collection web API to get a list of all the sub-sites created;
- Proceeds to loop through each sub-site and query each list that is created in the SharePoint sub-site;
- The response from querying each list API is a .JSON payload that is then pushed into either a list/document library at the top level;
- The process loops through each sub-site and list until it finishes;
- By the end of the process, you will have a centralised list/document library with all the list data .JSON payloads either as .JSON files in a document library or as items in a list with a field storing the .JSON payload; and
- The reporting tool of choice can then consume the .JSON objects in the list/library at the site collection level to pull through all the centralised site data.
Of course, there is a whole bunch of logic that needs to be implemented into the Power Automate process to cover some scenarios like:
- What happens if the list you are trying to query for each site doesn’t exist?
- Do we need to query all the sites all the time? Can’t we just look at the last modified date of the site and pull through the latest data?
- What happens if I’ve created a new list in some sites but not others?
- What happens if I delete data from the lists / delete a list entirely?
Of course, like any solution, there are some drawbacks which you will need to be aware of when building out a process like this:
- This is not a ‘real-time’ refresh solution. Meaning this process will need to be scheduled at x amount of times per day/week depending on the data currency requirements from the business;
- If you are developing this process to get data from thousands of different sites, you may want to utilise the Azure offerings like Logic apps where there are more options for configuration and a “Managed way” to do things. Also, we’ve found that Logic Apps can run heaps faster (depending on the resources assigned); and
- Always consider implementing additional logic to do an incremental refresh in your Power Automate process rather than getting all sub-site data all the time. This may not be an issue if there is 20-50 sites.
If you have got this far into reading this blog; first of all; Well done! Second of all, if you are interested in seeing some different scenarios where this pattern has been implemented for some of our customers, please get in touch and speak to our Power Platform experts who are always happy to “Geek-out” and share our solutions and lend a helping hand.