Calculate content size for documents with a custom content type

cancel
Showing results for 
Search instead for 
Did you mean: 
cesarista
Customer

Calculate content size for documents with a custom content type

Jump to solution

Hi:

We have a large repository with different (custom) content types, and we need a way of calculating the size on disk for all documents of each content type. Each custom content type (for example zpmSmiley Very Happyoc) is used by a content centric application, so it is the way for calculating how much sizes each application. Did you achieve this calculation / situation ? Maybe a custom SQL query ?

Best regards.

--C. 

1 Solution

Accepted Solutions
afaust
Master

Re: Calculate content size for documents with a custom content type

Jump to solution

The question always is: What do you count as the size of a document? Is it just the size of its current content or the sum of the size of all of its versions? What about copies that share the same content, e.g. when content has been deduplicated? Do you count the size on disk or the actual size, e.g. when compression has been (transparently) applied?

You could start from the following query that lists each node UID, its name, type and content property along with the size. It does not deal with versions or re-used content so if you sum up the sizes you will get a total large than the content store. Of course you can limit and group as you like.

select
   n.uuid as uid,
   np2.string_value as name,
   cu.content_size as size,
   ns1.uri as typeNamesüace,
   qn1.local_name as typeName,
   ns2.uri as propertyNamespace,
   qn2.local_name as propertyName
from
   alf_node_properties np1
   left join alf_node n on n.id = np1.node_id
   left join alf_node_properties np2 on np2.node_Id = np1.node_id
   left join alf_qname qn1 on qn1.id = n.type_qname_id
   left join alf_namespace ns1 on ns1.id = qn1.ns_id
   left join alf_qname qn2 on qn2.id = np1.qname_id
   left join alf_namespace ns2 on ns2.id = qn2.ns_id
   left join alf_qname qn3 on qn3.id = np2.qname_id
   left join alf_namespace ns3 on ns3.id = qn3.ns_id
   left join alf_content_data cd on cd.id = np1.long_value
   left join alf_content_url cu on cu.id = cd.content_url_id
where
   qn3.local_name = 'name'
   and ns3.uri = 'http://www.alfresco.org/model/content/1.0'
   and np1.actual_type_n = 21

View solution in original post

10 Replies
romschn
Senior Member

Re: Calculate content size for documents with a custom content type

Jump to solution

There are some useful db queries I found sometimes back here. http://blog.dbi-services.com/alfresco-some-useful-database-queries/ 

Kindly take a look to see if you could use any of them (possibly combining few of them) and see it works for you or not.

Hope this helps.

cesarista
Customer

Re: Calculate content size for documents with a custom content type

Jump to solution

Thank you Ramesh:

I knew the hyperlink. It is a good starting point for executing some Alfresco queries.

In fact it helps me to count zpm documents. For my example type (zpmSmiley Very Happyoc)

select count(*) 
from alf_node nd, alf_qname qn, alf_namespace ns
where qn.ns_id = ns.id
and nd.type_qname_id = qn.id
and ns.uri = 'http://www.zylk.net/model/zpm/1.0'
and qn.local_name = 'Doc'
and nd.store_id = 6;

It comes from a similar query from the article, but I need to include alf_namespace because I have more content types such as "arcSmiley Very Happyoc".

But I think need to include alf_content_data, alf_content_url tables for getting alf_content_url.content_size property and sum. Any Alfresco DBA ?

Thanks in advance.

--C.

afaust
Master

Re: Calculate content size for documents with a custom content type

Jump to solution

The question always is: What do you count as the size of a document? Is it just the size of its current content or the sum of the size of all of its versions? What about copies that share the same content, e.g. when content has been deduplicated? Do you count the size on disk or the actual size, e.g. when compression has been (transparently) applied?

You could start from the following query that lists each node UID, its name, type and content property along with the size. It does not deal with versions or re-used content so if you sum up the sizes you will get a total large than the content store. Of course you can limit and group as you like.

select
   n.uuid as uid,
   np2.string_value as name,
   cu.content_size as size,
   ns1.uri as typeNamesüace,
   qn1.local_name as typeName,
   ns2.uri as propertyNamespace,
   qn2.local_name as propertyName
from
   alf_node_properties np1
   left join alf_node n on n.id = np1.node_id
   left join alf_node_properties np2 on np2.node_Id = np1.node_id
   left join alf_qname qn1 on qn1.id = n.type_qname_id
   left join alf_namespace ns1 on ns1.id = qn1.ns_id
   left join alf_qname qn2 on qn2.id = np1.qname_id
   left join alf_namespace ns2 on ns2.id = qn2.ns_id
   left join alf_qname qn3 on qn3.id = np2.qname_id
   left join alf_namespace ns3 on ns3.id = qn3.ns_id
   left join alf_content_data cd on cd.id = np1.long_value
   left join alf_content_url cu on cu.id = cd.content_url_id
where
   qn3.local_name = 'name'
   and ns3.uri = 'http://www.alfresco.org/model/content/1.0'
   and np1.actual_type_n = 21
cesarista
Customer

Re: Calculate content size for documents with a custom content type

Jump to solution

Thank you Axel Faust‌ for the answer.

Relating to versions, size on disk and deduplication, thanks for opening new challenges. I have some minor questions, thanks for your time:

In the query appears the np1.actual_type_n. Where this come from ? Will be always 21?

np1.actual_type_n = 21 

When I run the previous query with some versions, I check that several versions appears in the list, and last version appears twice (ar-100.txt). I checked that this "doubled" last version does not appear in contentstore, and so they share the same content_url. May avoid them with a distinct(content_url) ? 

+--------------------------------------+------------+---------+----------------------------------+----------+
| uid                                  | name       | size    | typeNamespace                    | typeName |
+--------------------------------------+------------+---------+----------------------------------+----------+
| 50c123f8-3020-4366-b23e-97b8a22684b3 | ar-100.txt | 2747427 | http://www.zylk.net/model/zpm/1.0 | Doc |
| e0901a23-d0b7-4c54-b960-178a10e8fa2c | ar-100.txt | 390103  | http://www.zylk.net/model/zpm/1.0 | Doc |
| 67f8b82e-ca5d-4c1a-a28c-f8136e50c975 | ar-100.txt | 390103  | http://www.zylk.net/model/zpm/1.0 | Doc |
| cfcec513-fc1d-4d94-a7a0-10daab5812d1 | ar-101.txt | 4       | http://www.zylk.net/model/zpm/1.0 | Doc |
| 663daee4-9c91-4207-963f-d96f33a01d19 | ar-102.txt | 4       | http://www.zylk.net/model/zpm/1.0 | Doc |

And finally regarding deduplication and the differences between size on disk and size. Is there any way of calculating this from SQL side ?

Thanks in advance. 

Regards.

--C.

afaust
Master

Re: Calculate content size for documents with a custom content type

Jump to solution

The "21" is a system constant that originates from a Java enum in Alfresco code. It stands for "content data" and is only used on properties that have a valid link to a content on disk.

When using distinct you inherently loose the detail level required to determine the size on a node level, but you will end up with a correct total. One thing you can do to avoid versions in the query is add the condition on the store_id.

Size on disk and size in the DB tables will be identical for all Alfresco default content stores. Only when custom content stores are used that may transparently compress/uncompress may there be differences (e.g. I have such an addon). In case of deduplication it is not a question of "can I calculate it" on the SQL side (you can easily by using distinct) - it is more a question of "how do I correctly assign/track shared file sizes to individual nodes". That is not a technical challenge but a reporting one...

cesarista
Customer

Re: Calculate content size for documents with a custom content type

Jump to solution

Thanks for the answers, 

Regards.

--C.

udaikumar26
Customer

Re: Calculate content size for documents with a custom content type

Jump to solution

Can you please give the query to get the mimetype of a document. Also how do i query only the latest version.

What i am trying to get is the number of files in each type, for example how many PDFs are there?

kaynezhang
Advanced

Re: Calculate content size for documents with a custom content type

Jump to solution

Please try facetQueries feature and use content.mimetype as facetField .

About query syntax please refer to https://docs.alfresco.com/community/concepts/search-api-facetQueries.html.

udaikumar26
Customer

Re: Calculate content size for documents with a custom content type

Jump to solution

Thanks, how do i run this facetQuery? in a javascript file?