I've searched several threads regarding SQL statements to gather information on files, folders, etc. I'm well aware that the proper way would be to use the API or CMIS.
But for my case I have to rely on SQL.
When selecting the nodes (from the alf_node table) I filter on alf_qname.local_name = 'content'.
These content nodes, however, contain (uploaded) files, wiki-pages, blog-posts, discussion-topics and other nodes.
How can I properly distinguish these different content nodes? I.e. how to only select (uploaded) files? How to only select blog-posts? etc.
Solved! Go to Solution.
Well, I finally figured "something" out.
I can retrieve most of the nodes that refer to "files". It's neither well optimized nor can I guarantee it will work.
But for my needs it is currently sufficient.
SELECT n.id AS nodeId, n.uuid AS fileId, n.audit_created AS created, n.audit_modified AS last_updated, anp.string_value AS name, n.audit_creator AS creator, CONCAT('http://alfresco.bas.uni-koblenz.de/share/page/document-details?nodeRef=',store.protocol,'://',store.identifier,'/',n.uuid) AS content FROM alf_node AS n JOIN alf_qname AS q on (n.type_qname_id = q.id AND q.local_name='content') JOIN alf_store AS store ON (store.id = n.store_id) JOIN alf_node_properties AS anp on (n.id = anp.node_id) LEFT JOIN alf_content_data AS acd ON (anp.long_value = acd.id) JOIN alf_child_assoc AS aca ON (aca.child_node_id = n.id) JOIN alf_node AS nP ON (aca.parent_node_id = nP.id) JOIN alf_qname AS aqn ON (np.type_qname_id = aqn.id) WHERE anp.qname_id IN (SELECT id FROM alf_qname WHERE local_name='name' AND ns_id IN (SELECT ns2.id FROM alf_namespace AS ns2 WHERE uri LIKE '%content%')) AND n.audit_creator NOT LIKE 'System' AND aqn.local_name not like 'archiveUser' AND aqn.local_name not like 'person' AND anp.string_value like '%.%' AND anp.string_value NOT LIKE '%~%~%' AND anp.string_value NOT LIKE 'dashboard.xml'
You'll need to retrieve by content type, e.g.:
SELECT string_value AS document_name FROM alf_node node INNER JOIN alf_node_properties ON (id=node_id) INNER JOIN alf_qname ON (qname_id=alf_qname.id) WHERE ns_id IN (SELECT ns.id FROM alf_namespace AS ns WHERE ns.uri='http://www.alfresco.org/model/content/1.0') AND local_name='name';
thanks for your answer, but this is exactly what I did (including the namespace).
But the results contain nodes that refer to wiki-pages, discussion-posts, etc.
Do you know how to filter the results even further to only include files that have actually been uploaded by users?
Thanks for your patience!
When running exactly the query you sent me the results contain nodes with names like:
The results thus also contain a plethora of nodes referring to "files" which have not been "uploaded" by users.
Therefore my question: "How to distinguish content-nodes that refer to wiki-pages, discussion-posts, uploaded files, etc.?"
In short, in default Alfresco without any customisations in place it is impossible to differentiate between documents, wiki articles, blog posts etc. since they all are of type cm:content. In some of my past projects I had developed automation which specialises e.g. wiki articles into a custom wiki content type to be able to better differentiate them from "real" content, e.g. by excluding the custom type from super-type queries. In a default Alfresco, the only way to reliably exclude the "other" types of contents in a site from "real" content is by filtering on the parent node and making sure it is either a regular folder or a folder with site component ID set to "documentLibrary", and NOT a folder with site component ID set to anything else...
You can exclude thumbnails / renditions by excluding any content nodes with the cm:thumbnail / rn:rendition type. You can exclude the surf-config component descriptor nodes by using a name-pattern filter, as they unfortunately also share the generic content type and cannot otherwise be differentiated. This should also work for deployed module descriptors as well as dashboard page configuration files. Content in the Data Dictionary also shared the generic content type, and that will be much harder to filter out in a DB query.
Also note that many files will be associated with various users even if they did not "upload" them. E.g. some files are created automatically when you create a site, change your dashboard configuration or simply try to preview a document. For that reason, it is not really possible to distinguish documents in the system between "uploaded by user" (via the upload function) and "otherwise/indirectly created by user".
Generally I recommend using proper Alfresco FTS instead of DB queries, as that would allow you to use PATH / ANCESTOR conditions to limit the sub-structures from which you retrieve content nodes, and implicitly deals with any prefix - namespace URI mapping for you. With EXACTTYPE you can also do targeted retrieval without being affected by child / parent type relationships.
thanks for your detailed answer.
I need to access all information via Dremio. Therefore I am afraid that I have to adhere to SQL. I am not aware of any suitable DataSource beside SQL which is offered by Dremio.
I will try to create SQL statements for each node type which is relevant for me.
I have to admit, though, that I'm a little disappointed that there isn't an easier way. Especially considering that the overall data structure of Alfresco is designed in such a stringent and compelling way.
Well, why don't you create your own custom content model and apply to all the items you need it?
If you're applying this kind of solution, you can adapt the query to your new namespace, so will be extracted those data you need it.
You can do an easy test:
Some url that can help you:
If this test is a success, then we will let you know which next steps you're need to perform.
Let us know.
if I wanted to change it just for my instance, that would probably be a great idea.
However, I need to design my queries to be compatible with other instances.
I am aware that using the API would be good for this very reason, however, as I said, I am reliant on Dremio.