Hi:
We have a large repository with different (custom) content types, and we need a way of calculating the size on disk for all documents of each content type. Each custom content type (for example zpmoc) is used by a content centric application, so it is the way for calculating how much sizes each application. Did you achieve this calculation / situation ? Maybe a custom SQL query ?
Best regards.
--C.
Solved! Go to Solution.
The question always is: What do you count as the size of a document? Is it just the size of its current content or the sum of the size of all of its versions? What about copies that share the same content, e.g. when content has been deduplicated? Do you count the size on disk or the actual size, e.g. when compression has been (transparently) applied?
You could start from the following query that lists each node UID, its name, type and content property along with the size. It does not deal with versions or re-used content so if you sum up the sizes you will get a total large than the content store. Of course you can limit and group as you like.
select
n.uuid as uid,
np2.string_value as name,
cu.content_size as size,
ns1.uri as typeNamesüace,
qn1.local_name as typeName,
ns2.uri as propertyNamespace,
qn2.local_name as propertyName
from
alf_node_properties np1
left join alf_node n on n.id = np1.node_id
left join alf_node_properties np2 on np2.node_Id = np1.node_id
left join alf_qname qn1 on qn1.id = n.type_qname_id
left join alf_namespace ns1 on ns1.id = qn1.ns_id
left join alf_qname qn2 on qn2.id = np1.qname_id
left join alf_namespace ns2 on ns2.id = qn2.ns_id
left join alf_qname qn3 on qn3.id = np2.qname_id
left join alf_namespace ns3 on ns3.id = qn3.ns_id
left join alf_content_data cd on cd.id = np1.long_value
left join alf_content_url cu on cu.id = cd.content_url_id
where
qn3.local_name = 'name'
and ns3.uri = 'http://www.alfresco.org/model/content/1.0'
and np1.actual_type_n = 21
There are some useful db queries I found sometimes back here. http://blog.dbi-services.com/alfresco-some-useful-database-queries/
Kindly take a look to see if you could use any of them (possibly combining few of them) and see it works for you or not.
Hope this helps.
Thank you Ramesh:
I knew the hyperlink. It is a good starting point for executing some Alfresco queries.
In fact it helps me to count zpm documents. For my example type (zpmoc)
select count(*)
from alf_node nd, alf_qname qn, alf_namespace ns
where qn.ns_id = ns.id
and nd.type_qname_id = qn.id
and ns.uri = 'http://www.zylk.net/model/zpm/1.0'
and qn.local_name = 'Doc'
and nd.store_id = 6;
It comes from a similar query from the article, but I need to include alf_namespace because I have more content types such as "arcoc".
But I think need to include alf_content_data, alf_content_url tables for getting alf_content_url.content_size property and sum. Any Alfresco DBA ?
Thanks in advance.
--C.
The question always is: What do you count as the size of a document? Is it just the size of its current content or the sum of the size of all of its versions? What about copies that share the same content, e.g. when content has been deduplicated? Do you count the size on disk or the actual size, e.g. when compression has been (transparently) applied?
You could start from the following query that lists each node UID, its name, type and content property along with the size. It does not deal with versions or re-used content so if you sum up the sizes you will get a total large than the content store. Of course you can limit and group as you like.
select
n.uuid as uid,
np2.string_value as name,
cu.content_size as size,
ns1.uri as typeNamesüace,
qn1.local_name as typeName,
ns2.uri as propertyNamespace,
qn2.local_name as propertyName
from
alf_node_properties np1
left join alf_node n on n.id = np1.node_id
left join alf_node_properties np2 on np2.node_Id = np1.node_id
left join alf_qname qn1 on qn1.id = n.type_qname_id
left join alf_namespace ns1 on ns1.id = qn1.ns_id
left join alf_qname qn2 on qn2.id = np1.qname_id
left join alf_namespace ns2 on ns2.id = qn2.ns_id
left join alf_qname qn3 on qn3.id = np2.qname_id
left join alf_namespace ns3 on ns3.id = qn3.ns_id
left join alf_content_data cd on cd.id = np1.long_value
left join alf_content_url cu on cu.id = cd.content_url_id
where
qn3.local_name = 'name'
and ns3.uri = 'http://www.alfresco.org/model/content/1.0'
and np1.actual_type_n = 21
Thank you Axel Faust for the answer.
Relating to versions, size on disk and deduplication, thanks for opening new challenges. I have some minor questions, thanks for your time:
In the query appears the np1.actual_type_n. Where this come from ? Will be always 21?
np1.actual_type_n = 21
When I run the previous query with some versions, I check that several versions appears in the list, and last version appears twice (ar-100.txt). I checked that this "doubled" last version does not appear in contentstore, and so they share the same content_url. May avoid them with a distinct(content_url) ?
+--------------------------------------+------------+---------+----------------------------------+----------+
| uid | name | size | typeNamespace | typeName |
+--------------------------------------+------------+---------+----------------------------------+----------+
| 50c123f8-3020-4366-b23e-97b8a22684b3 | ar-100.txt | 2747427 | http://www.zylk.net/model/zpm/1.0 | Doc |
| e0901a23-d0b7-4c54-b960-178a10e8fa2c | ar-100.txt | 390103 | http://www.zylk.net/model/zpm/1.0 | Doc |
| 67f8b82e-ca5d-4c1a-a28c-f8136e50c975 | ar-100.txt | 390103 | http://www.zylk.net/model/zpm/1.0 | Doc |
| cfcec513-fc1d-4d94-a7a0-10daab5812d1 | ar-101.txt | 4 | http://www.zylk.net/model/zpm/1.0 | Doc |
| 663daee4-9c91-4207-963f-d96f33a01d19 | ar-102.txt | 4 | http://www.zylk.net/model/zpm/1.0 | Doc |
And finally regarding deduplication and the differences between size on disk and size. Is there any way of calculating this from SQL side ?
Thanks in advance.
Regards.
--C.
The "21" is a system constant that originates from a Java enum in Alfresco code. It stands for "content data" and is only used on properties that have a valid link to a content on disk.
When using distinct you inherently loose the detail level required to determine the size on a node level, but you will end up with a correct total. One thing you can do to avoid versions in the query is add the condition on the store_id.
Size on disk and size in the DB tables will be identical for all Alfresco default content stores. Only when custom content stores are used that may transparently compress/uncompress may there be differences (e.g. I have such an addon). In case of deduplication it is not a question of "can I calculate it" on the SQL side (you can easily by using distinct) - it is more a question of "how do I correctly assign/track shared file sizes to individual nodes". That is not a technical challenge but a reporting one...
Can you please give the query to get the mimetype of a document. Also how do i query only the latest version.
What i am trying to get is the number of files in each type, for example how many PDFs are there?
Please try facetQueries feature and use content.mimetype as facetField .
About query syntax please refer to https://docs.alfresco.com/community/concepts/search-api-facetQueries.html.
Thanks, how do i run this facetQuery? in a javascript file?
Ask for and offer help to other Alfresco Content Services Users and members of the Alfresco team.
Related links:
By using this site, you are agreeing to allow us to collect and use cookies as outlined in Alfresco’s Cookie Statement and Terms of Use (and you have a legitimate interest in Alfresco and our products, authorizing us to contact you in such methods). If you are not ok with these terms, please do not use this website.