Commons talk:Categories

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
This is the talk page for discussing improvements to Commons:Categories.
Archives: 1, 2, 3, 4, 5

Sortkey recommendations

[edit]

Regarding this:

The special sortkey τ (lowercase Greek letter Tau) is used to sort templates at the end of the related Commons-category, see for example Category:Transport templates sorted in Category:Transport. (Sorting in Commons is not case sensitive so only uppercase Τ (Tau) is shown.)

I wonder whether this is a good recommendation. I tried it for a few categories and I found it quite confusing because the uppercase Tau is, in practice, visually indistinguible from the latin letter T. I see a lot of template categories with a sortkey of ~ and would actually consider that a very good idea because it's sorted after Z, it's visually recognizable, and it's a character which you probably can find on way more keyboards than the Tau.

On a side note, I would expect using three different dashes as sortkeys to create a lot of confusion, and for many people it will be hard to understand the difference between them. So I would also suggest to remove any mention of the emdash and the endash here.

Thanks -- Reinhard Müller (talk) 15:35, 16 February 2024 (UTC)[reply]

agreed. i think common practice is using ~ for anything commons-, or more broadly, wikimedia-related stuff.
the dashes were added by https://commons.wikimedia.org/w/index.php?title=Commons%3ACategories&diff=prev&oldid=703824439 . RZuo (talk) 15:40, 16 February 2024 (UTC)[reply]
@W like wiki: any input from your side? --Reinhard Müller (talk) 15:53, 22 February 2024 (UTC)[reply]
Grüß Dich @Reinhard Müller: Yes, I agree. I can't remember on which Commons page I read this recommondation with the Greek Tau before including it here, but I had always the same problems, need to copy paste it from here or even made a copy of this on my user page. And also about the equal look to normal "T" I was not so happy, but when I wrote this chapter I can not create a new rule. But when we do this now I appreciate!
Same with the dashes. Maybe some have an idea if they are useful, but also here I would apreciate a changing, deletion in this case. Best Regards -- W like wikiPlease ping me!Postive1Postive2  16:25, 22 February 2024 (UTC)[reply]
Thanks to everybody who commented! I updated the page and hope that I didn't mess up anything regarding the translation. --Reinhard Müller (talk) 18:15, 22 February 2024 (UTC)[reply]

Controversial categories

[edit]

Hi, I'd like to get feedback regarding categories that can be seen as controversial. On en-wiki, there is a rule that

Categorizations should generally be uncontroversial; if the category's topic is likely to spark controversy, then a list article (which can be annotated and referenced) is probably more appropriate.

As far as I can see there is no such policy on Wikicommons. Is there some other policy which deals with this issue? What is the community consensus?

To provide a concrete example, this edit added back the category Territories under occupation by Russia to the category Abkhazia. This is controversial, since while the overwhelming majority of countries consider Abkhazia to be a part of Georgia, only a minority explicitly said that it's occupied by Russia (see Wikipedia:Russian-occupied_territories_in_Georgia#International_position).

I believe that this category isn't helpful since the category name cannot explain all these nuances. It would be better to create a page/gallery with the related media. I'm pinging User:Laurel_Lodged who has added this category. Alaexis (talk) 09:21, 28 May 2024 (UTC)[reply]

I agree that some such policy is needed in Commons. I agree that "Categorizations should generally be uncontroversial". But one editor's uncontroversial is another editor's hot potato. Unlike Wiki, Commons does not lend itself to list article creation. So the likely solution is a case-by-case evaluation and an agreement to adhere to community consensus. By the way, regarding Abkhazia, Wiki itself says, "On 23 October 2008, the Parliament of Georgia declared Abkhazia a Russian-occupied territory, a position shared by most United Nations member states.[1] So it's not just me. Laurel Lodged (talk) 15:05, 28 May 2024 (UTC) Laurel Lodged (talk) 15:05, 28 May 2024 (UTC)[reply]
Ehh, where do you see that in the source? I've tagged it on en-wiki. If I'm missing something and the source does say it, then indeed it wouldn't be controversial and I would not object to placing Abkhazia in this category. Alaexis (talk) 20:23, 28 May 2024 (UTC)[reply]
"Georgia asserted that the territories of South Ossetia and Abkhazia, including the upper Kodori Valley, were occupied by Russian forces. On 23 October, the Parliament of Georgia adopted a law declaring Abkhazia and South Ossetia “occupied territories” and the Russian Federation a “military occupier.” This claim was reiterated […] In describing the “current occupation” Georgia also stated: “the western part of the former ‘buffer zone’ (the village of Perevi in the Sachkhere District) remains under Russian occupation." If Wiki is making claims not supported by sources, then Wiki is the place to make those edits. Laurel Lodged (talk) 10:30, 30 May 2024 (UTC)[reply]
Yes, absolutely. But there is a difference between *Georgia* considering it an occupied territory and "most UN members" sharing this position. I never argued with the former. Alaexis (talk) 08:42, 1 June 2024 (UTC)[reply]
 Question @Alaexis, @Laurel Lodged It is hard to tell if this is really a question about general policy or if it is really a discussion about a particular case. In the case of the latter, this should really be had as a CfD over Category:Abkhazia. If there is a change to policy that you think would help improve things, that should be discussed here, and you can certainly refer to this case as reference. Josh (talk) 20:15, 18 July 2024 (UTC)[reply]

FYI: Moved historical page, redirected that target to this page

[edit]

"Commons:Naming categories" now redirects to this commons: ns page. It was problematic for the number of links (internal and from WD), and the confusion being caused with the pre-existing arrangement.

The page that was at that space is now at Commons:Naming categories (historical). The number of links to its detail are minimal, and it should not be problematic for functional management of this site having it moved.  — billinghurst sDrewth 00:02, 31 May 2024 (UTC)[reply]

@Billinghurst Thank you for doing that, it is a big help to avoid confusion for folks. Josh (talk) 20:02, 18 July 2024 (UTC)[reply]

Sortkey recommendations

[edit]

a question that bounced around in my mind a few times is what are the purposes of each of the symbolic sortkeys? the most commons ones I see are '(space)', '*', '+' and '~'. what are their roles?

So far, that's not clearly defined, and different people use completely different sortkey prefixes for the same purpose.
I have collected a few ideas about what could be seen as "best practice". I don't know whether we actually want to come up with a policy or at least a recommendation, but if, then this list might serve as a base for that. Thanks --Reinhard Müller (talk) 07:02, 9 July 2024 (UTC)[reply]

another thing while discussing sorting is a common thing that I see in category pages with accent marks in the titles: they use {{DEFAULTSORT:}} to exclude remove the accents. simple example is 'café' which is turned into {{DEFAULTSORT:cafe}}. if this is something that should be encouraged in the wiki, please feel free to add it to the policy! Juwan (talk) 10:11, 8 July 2024 (UTC)[reply]

@Reinhard Müller, thank you for sharing some ideas on ways to use a variety of sort keys to sub-sort by type of sub-topic. I am often frustrated by the willy-nilly use of special characters by users, especially to '+1' their preferred topics to the top of the list. I readily use a few established special characters for sorting non-topical categories, such as a space for index categories, # for numbers, ? for 'unidentified' (maintenance) categories, and ~ for some other types of maintenance categories. For topical non-number categories however, I do not see the attraction of using special character sorting, as it requires a few things at a minimum:
  1. The user must already be familiar with the sort key special character system.
  2. The user must parse the topic they are seeking, in order to figure out which special character they should look under.
  3. The system has to be consistently-enough employed that once a user has passed hurdles 1 and 2, they can have some confidence they will actually find what they are looking for.
Currently, none of these are true for a lot of the special characters, and so I generally resist using them for topical categories, and while I think your list is well thought-out, I don't think in the end that it provides any real additional value over using the alphabetical sort system that categories are fundamentally based on.
As for using sort-keys for normal alphabetical sorting (e.g. using sortkey 'buildings' to sort Category:Science buildings in Category:Science), that is extremely useful and I use it a lot. I do think some additional guidelines right here on COM:CAT to help users quickly grasp common practices is a good idea. Josh (talk) 19:56, 18 July 2024 (UTC)[reply]
it is certainly a very nice scheme. the only issue in my case is what you've raised. is this perhaps going too far? as in, is it too complex for someone to understand, especially without necessarily having to read the policy? Juwan (talk) 20:21, 18 July 2024 (UTC)[reply]
@JnpoJuwan I always try and keep accessibility front and center in my mind when considering categorization. For someone like me who's been on the project since its inception (or close enough anyway), I am able to take the time to learn and apply various elegant schemes for organization, but for especially newer or irregular users, that really isn't practical. Even as a veteran, I am routinely frustrated when I look for something and don't find it (such as buildings under b) but instead have to then figure if a) the sub even exists, and b) what special character did someone come up to sort it under. Having a standardized key list and implementing it consistently might help that for me since I spend a lot of time in categories and can learn and keep fresh that knowledge (I even kind of like the scheme), but I still don't think it helps the bulk of less-regular users just looking to sort their contributions or find images for their projects. For this reason, I think it falls down on the accessibility question. Josh (talk) 20:59, 18 July 2024 (UTC)[reply]
@JnpoJuwan As far as accent marks (or their suppression) in sort keys is concerned, I have seen some discussion on whether the current sort algorithm handles accents and other diacritics as it should. It is certainly not consistent about how search handles them. I don't know if we really should be suppressing them though, and I generally don't in the few cases I've had them to worry about. My native language doesn't use diacritics (except for borrow words) so I probably don't have the best intuitive feel on which way to go on this question. Josh (talk) 20:01, 18 July 2024 (UTC)[reply]
to give you some perspective, in my native language Portuguese at least, we tend to ignore accent marks when sorting, so an algorithm would sort it like so: aa áb ac. I haven't seen how other languages manage their sorting schema (speakers of, for example, Spanish, German, Swedish would probably want diacritics kept), I need more opinions on that side. Juwan (talk) 20:18, 18 July 2024 (UTC)[reply]
@JnpoJuwan Thank you for that insight, it is always fascinating how different languages have such different perspectives on the world. As a mono-lingual project with a multi-lingual audience, that remains a big challenge for Commons to grapple with. Josh (talk) 21:01, 18 July 2024 (UTC)[reply]
Spanish considers ñ a distinct letter between n and o; accented vowels are treated as if the accent weren't there, except if words are otherwise identically spelled (e.g. que and qué), in which case the unaccented one comes first. Historically they treated ch as a single latter sorting after c and ll as a single letter sorting after l, but in the last few decades that has largely disappeared.
German normally sorts ä, ö, and ü as ae, oe, and ue; the difference is considered a typographic convention. Ditto for ß and ss.
Romanian sorts ș after s and ț after t and considers them distinct letters. Similarly a, ă, â are considered distinct letters (in that order), and the same for i and î.
Those are the only languages other than English where I know enough to speak confidently. Inconveniently, as far as I can tell, mediawiki doesn't readily support correctly sorting ñ, ș, or ț, nor the three non-standard Romanian vowels. - Jmabel ! talk 21:43, 18 July 2024 (UTC)[reply]
In my native language Hungarian, ö (and ő) is sorted after o (and ó), the same goes for u/ú and ü/ű (other diacritic differences – including those between o and ó, between ö and ő etc. – count only if there’s no other difference, but di- and trigraphs have their own places – if they really di- or trigraphs, and not only those letters next to each other; a rule nearly impossible to create an algorithm for). This means that according to the Hungarian rules, Olaszliszka goes before Öcsöd – however, according the German rules, it’s just the other way round, Oecsoed being before Olaszliszka! This demonstrates that a Commons-wide default cannot fulfill all languages’ needs, so I think the only sensible default other than the current one is completely disregarding accents, i.e. treating ñ and ň the same as n; ö, ő and ô the same as o; ș, ş and š the same as s, and so on. —Tacsipacsi (talk) 22:21, 18 July 2024 (UTC)[reply]
in short, is what {{DEFAULTSORT:}} tries to achieve is a way to bypass MediaWiki's (current) technical restrictions? Juwan (talk) 22:53, 18 July 2024 (UTC)[reply]
@JnpoJuwan: not really. Even with {{DEFAULTSORT:}} we have to live with most of those restrictions. But (besides the issues that started this discussion about handling incommensurate subcats separately) {{DEFAULTSORT:}} lets us
  • sort people "last name first" (though increasingly this happens implicitly as a side effect of Wikidata Infoboxes)
  • sort numbers sanely (by default they'd sort alphabetically) so we can force a sequence 1, 2, 3, ... 9, 10, 11, ... 20, ... instead of 1, 10, 11, ... 2, 20, ... 3, ... 9
  • do things like in a language where every public square is going to begin with "Plaza", sort a list of public squares by the part of the name that actually matters, so not everything is just lumped under "P"
As noted above, some other uses are more controversial. - Jmabel ! talk 05:04, 19 July 2024 (UTC)[reply]
sorry, I didn't specify that I mean in this context. these are all perfectly fine uses that I am aware of and have used before. Juwan (talk) 23:46, 21 July 2024 (UTC)[reply]

Use of English varieties in category names

[edit]

There was a discussion at Commons talk:Categories/Archive 4#LANGVAR in category names ?, and many users had expressed support to implement local dialectal names for categories. However, there was no consensus on the proposal by Joshbaumgartner, which would implement it. So I have modified the proposal and drafted it at User:Sbb1413/ENGVAR proposal. It is not intended to be a separate policy. Rather, it is intended to be additions and modifications of the existing policy at COM:CAT to accommodate local dialectal terms. The main changes of this proposal include the avoidance of ambiguous dialectal terms. Sbb1413 (he) (talkcontribsuploads) 14:02, 18 July 2024 (UTC)[reply]

I have formally withdrawn the original proposal. See the "New proposal" section below for further discussions. --Sbb1413 (he) (talkcontribsuploads) 18:27, 11 August 2024 (UTC)[reply]

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

@Sbb1413 Thank you for raising this for some further discussion. This represents a potentially significant redirection in our category naming approach and as such you are completely correct to frame it as a change/addition to current Commons category policies as opposed to a new stand-alone policy. I think this is a good approach, as it necessarily requires us to consider any impacts on existing comcat and adjust them at the same time to rectify any inconsistencies that might be exposed were this merely to be put forward as an unrelated new policy. I unfortunately have some other priorities at the moment, but I want to give this due thought and provide a comprehensive input, though it may be a week or so before I can do that properly. In the meantime, I would like to get clarity on a couple of items just to understand the starting point here as accurately as possible:
  1. Would the intent be to have an official list of approved language variations for specific topics with due process required for additions or changes, or is each topic to simply be up to the normal wrangling among users over which term is right in their given locale?
  2. Would the intent be to retroactively apply this policy to existing topics, or to stick with the policy of needing more than just langvar reasons to change an existing category?
  3. What ultimate authority, if any, would we be relying on to determine correct langvar?
I'm sure there will be more to follow. Josh (talk) 19:39, 18 July 2024 (UTC)[reply]
@Joshbaumgartner My answers:
  1. The intent is to have a consensus-based list of accepted language variations for certain topics. The list will be inserted at the top of a topic category. For small categories, additions and changes would be done boldly, while larger categories would require agreements with some other users.
  2. This policy will be retroactively applied to existing topics.
  3. The ultimate authority to determine the correct ENGVAR is of course consensus.
Sbb1413 (he) (talkcontribsuploads) 11:27, 19 July 2024 (UTC)[reply]
I have created a rough draft of a templated list of consensus-based English dialectal terms at User:Sbb1413/ENGVAR template. This can be a standalone template or a part of {{Topic by country}} and {{Country category}} templates. China and Russia are included for their own English terms for "astronaut". Sbb1413 (he) (talkcontribsuploads) 12:03, 19 July 2024 (UTC)[reply]
@Sbb1413 Thanks, those are more or less along the lines of what I would have guessed. I think the right way to do the {{Topic by country}}, etc. templates would be to build in a langvar switch of some sort, ideally without requiring manual activation, with the approved variations added into the data templates. However, I wouldn't really worry too much about exactly how to do the templates at this stage. I would simply remark that converting templates to support this will be its own effort for a team to develop and implement once the new policy is enacted. This discussion should focus on what the right policy is and getting the language (no pun) correct for COM:CAT. Templates and other tools will have to follow suit. Josh (talk) 12:43, 19 July 2024 (UTC)[reply]
One of the things that strikes me as I think about this direction, is that all of the logic for saying that Australia categories should be given Australian English names instead of the universal English topic name, or Canada, the US, etc., is the same logic that would say France should use French or Mexico should have Mexican Spanish variations of a topic name. Obviously, we are starting this discussion limited to English variants, but realistically, I'm not sure that is anything more than a purely arbitrary line we are drawing. Just a thought. Josh (talk) 12:52, 19 July 2024 (UTC)[reply]
@Joshbaumgartner We can use the same English term for all countries if it is used by all the major dialects. If such term doesn't exist, some countries will use their own regional terms while others will use the "universal English topic name". Since our core naming policy is to use English in category names as much as possible, we obviously shouldn't extend this proposal to other languages. Although gallery names can use local languages, the dialects of other languages (except Portuguese) don't really differ by spelling or vocabulary. Sbb1413 (he) (talkcontribsuploads) 13:07, 19 July 2024 (UTC)[reply]
Well our core policy is also to apply the Universality Principle, and this proposal is considering upending that. In fact, from a legal theory perspective, I would posit that the universality principle is the primary supporting principle of the English-only naming policy, or at least it is the key reflection of the policy's intent, i.e. if the UP goes (or is neutered), then what real basis is there for the English-only policy? I totally understand that we don't necessarily want to extend this proposal to other languages, as that would be a much bigger fish to fry and may bring in some vocal opposition, but I'm not sure if maintaining English-only is still tenable if the UP is eroded. I don't think this is a point for or against this effort, just a consideration of potential future ramifications. Also, this is a fundamental policy change being considered, so I would not consider anything obvious. As for variations in other languages, I know Spanish spoken in Mexico has plenty of variation vis-a-vis Spain or even other parts of Latin America. That is my only personal practical experience with a non-English language, so I'm 2-for-2 so far on language variations being a thing. However, even if a language is completely homogenous across its usage, it doesn't change the point that localization is localization, whether we are talking variants or entirely different languages. Josh (talk) 13:32, 19 July 2024 (UTC)[reply]

 Oppose Can we have some compassion with people on Commons who do not have English as their native language? It is OK to have English as the main language for categories, I agree with that. But let's not allow all kind of local variations of English within a category string.

  • Not only for myself (Dutch, wich is a language in the same (German) language family as English, and so relatively close to English), but I also think of all people at the other end of the language spectrum, who are even used to another script than Latin (a lot of Asian languages + Cyrillic + Greece and I might forget some).
  • I do not only search along the line of a category string, but also via the search box. If I know that the main category is "Gray", I expect that I can search for Gray houses in Norfolk, England‎ as well, and not have to know that is then "Grey". And if the main category is "Flower shops", I do not want to have to search with "Florist shops" for Leeds.

Let's keep it simple, and let everybody has to adjust; in comparison with non native English speakers to me it looks a little sacrifice for the people who are used to local vatiants of English. --JopkeB (talk) 10:19, 22 July 2024 (UTC)[reply]

@JopkeB, this is one of my primary concerns. Requiring that users have some ability to work in English (possibly through 3rd party software) seems at the moment unavoidable given the technical limitations of the basic software. However, this is a multi-lingual project, and conceivably should permit full utility for users regardless of their particular language skills. Universality seems to be the best solution for this that I am aware of since at least once a term or phrase is known, it can be relied on going forward, per your 'gray/grey' example. As a native English speaker (are are probably the plurality of Commons users), I have no problem at all reading 'flower shops' and 'florist shops' or 'gasoline stations' and 'filling stations' and instantly understanding what is being said, but even then, I share the frustration of having to try several searches for a sub-category, even when I know the parent category name, not sure whether the sub-category just doesn't exist or if I should try another different term or format. I can only imagine that frustration is multiplied for those without native or even substantial English knowledge. Any proposal to deviate from Universality would need to clearly show that it will not add any additional hurdles for non-English users (the majority of the planet) for me to support it. Josh (talk) 20:24, 22 July 2024 (UTC)[reply]
  •  Oppose Per JopkeB and Joshbaumgartner. You can get into some pretty sketchy and pedantic territory when you allow for local variations. Even with English, which would seem to only have a few, but actually have quit a lot. See California English for one example outside of British and American English. There's also Scottish English, Irish English, Etc. Etc. We shouldn't have to figure out which variation to go with every time we want to search for a category per the Universality Prenciple and just because it would be a major pain the ass. It makes sense to use American English as the standard though because it's most widely used form of English in the world. No offense to British people, but British English is only spoken in Britain and a few former colonies and there's no reason to over turn the University Principle (which as Josh points out is at the core of Commons policy) just to adopt a niche form of English. That doesn't even get into variations of other languages either. --Adamant1 (talk) 03:03, 27 July 2024 (UTC)[reply]
    • The mention of "California English" is really an overstatement. The linked article is mainly about regional pronunciation variants in three different regions of California. About the only thing here that would be different in writing is that Californians commonly use "the" with a highway route number ("the 5" instead of "Interstate 5" or "I-5"). - Jmabel ! talk 17:02, 27 July 2024 (UTC)[reply]
    Take it or leave it. It was just the first example that came to mind. Of course I'm not a linguist. Nor am I saying the idea of local variations should be rejected purely (or even at all) because of California English. Or really anything else. I'm just pointing out that the idea that there is only two variations of English in existence, American and British, is overly simplistic. --Adamant1 (talk) 00:16, 28 July 2024 (UTC)[reply]
    Agree: "We should have to figure out which variation to go with every time ...".
    Not agree: "use American English as the standard", because British English is NOT only spoken in Britain; we in the Netherlands learn British English at school and I guess there are other countries where British English is teached. So it is not only the countries where English is the main language you have to take into account. JopkeB (talk) 04:09, 27 July 2024 (UTC)[reply]
    Likewise I do not agree with adopting any specific variant of English as standard, but instead I think the status quo is correct. So long as the term is mutually intelligible and accurately conveys the topic, it should not be changed just to align with one or the other variant. Whether it is gray or grey, any English speaker can easily understand what is being covered, even if they may smirk at what they consider a misspelling, so no real reason to go with one over the other, so long as what we go with is consistently used throughout Commons. I can only imagine the howls of protest if we start trying to rename everything to match UK, US, etc. spelling/vocabulary--probably only slightly less than the ruckus that will happen if every locality is fair game to argue about what local term is most common there. Josh (talk) 05:07, 27 July 2024 (UTC)[reply]
    I'm not neccessarily advocting for adopting any specific variant of English as a standard. But its juat a fact that American English is the standard due to how widely its used compared to other ones and universality. I'm sure there's edge cases where that's not the case though, which is fine. I don't think its something can realistically enforce one or another. Although adopting this clearly wouldn't be beneficial even if people are already using local variations without it. --Adamant1 (talk) 06:19, 27 July 2024 (UTC)[reply]
    @Adamant1: Can you prove that "its just a fact that American English is the standard ..."? How is it measured? JopkeB (talk) 04:53, 28 July 2024 (UTC)[reply]
    @JopkeB: I mentioned it below this, but from what I remember when I looked into it a year or two ago the top countries that contributors come from on here are the United States and Germany. German's usually (if not exclusively) speak American English. Regardless, it largely depends on the numbers from there, but if people from the United States and Germany make up the largest amount of total English speaking users on here then American English is just naturally going to be the standard. That's assuming my memory is correct and that nothing has changed about the user base since then. --Adamant1 (talk) 05:13, 28 July 2024 (UTC)[reply]
  • Option B since some level of consistency is better than none. Laurel Lodged (talk) 07:36, 27 July 2024 (UTC)[reply]
  • Option B Best of these options. One could at a later point think about machine translated category titles where the things are mostly addressed that way which is when option A & C become more reasonable. --Prototyperspective (talk) 10:13, 27 July 2024 (UTC)[reply]
  • Option A because the discussion of "national variants" doesn't help the universality. Living in Europe and not being a native English speaker, I learned a British English and found this in most contacts on the continent. And if sbb1413 says that "Although gallery names can use local languages, the dialects of other languages (except Portuguese) don't really differ by spelling or vocabulary" it only shows a lack of knowledge about other languages. I speak a German dialect that is not normally written but even our written language has so many words that are different from the words used in Germany that misunderstandings are possible. Unless a really universal English can be defined, it is better to accept "regional varieties" and help users with good category trees. I would be happy to find the gray houses in the grey houses main category, rather than knowing I must search for gray (or v/v). We can only continue peacefully if the existing varieties of English are accepted and not one be declared universal. And I clearly contradict the above statement " British English is only spoken in Britain and a few former colonies". Please accept variety. It will help universality more than putting regional English varieties down. And watch the category trees!-- Gürbetaler (talk) 22:00, 27 July 2024 (UTC)[reply]
British English can be extended to a few other places besides former Crown Colonies. It still doesn't disprove my point that American English has vastly more usage then British English globally. Not like it matters though since know where have I said I think we should former with the later, or visa versa. Both can coexist depending on the situation perfectly fine. Although there's still the practical issue of universality which that is inconsistant with regardless. --Adamant1 (talk) 00:03, 28 July 2024 (UTC)[reply]
@Adamant1: do you have any basis for the claim that American English is far more widely used globally? Not particularly what I have observed. For example, educated people from India who speak English speak an English far more related to British English than American English, and I'd guess right there we are talking about a population in the hundred-million range. It is in some ways distinct, but in every way that British and American English differ from one another, it is more like the British. - Jmabel ! talk 01:38, 28 July 2024 (UTC)[reply]
@Jmabel: I don't remember the exact numbers but speaking purely in relation to Commons I think the top two countries contributors come from is Germany and the United States and I'm pretty sure they mainly (if not exclusively) speak American English in Germany. I don't remember where India is on that list, but it really doesn't matter how many people from India speak British English if they only make up like 5% of contributors to begin with. Outside of that this website has a map at the top showing the usage of both globally. Assuming it's accurate sure British English takes up more land mass globally but so what? Cool that British English is more popular in Siberia by land mass. Those people are either not on Commons to begin with or mainly (if not exclusively) create categories in Russian. I'd say look at what variation of English the people from like the top 5 countries (or top 1 or 2 depending on the numbers) for editors is on here and go with that. Otherwise I think your just losing the point in universality. --Adamant1 (talk) 02:27, 28 July 2024 (UTC)[reply]

I don't remember where India is on that list, but it really doesn't matter how many people from India speak British English if they only make up like 5% of contributors to begin with.

@Adamant1 Indians are 5% if you consider all the contributors. However, if you consider only the anglophone contributors, the percentage should be at double digits. Besides contributors, there are also significant viewers from India who watch our images via Wikipedia.

Assuming it's accurate sure British English takes up more land mass globally but so what? Cool that British English is more popular in Siberia by land mass. Those people are either not on Commons to begin with or mainly (if not exclusively) create categories in Russian.

I don't believe in this prejudice since I strongly believe in the "all men are created equal" principle. Besides, Serbians have their own language, which use both Latin and Cyrillic scripts. Sbb1413 (he) (talkcontribsuploads) 05:23, 28 July 2024 (UTC)[reply]
there are also significant viewers from India who watch our images via Wikipedia. @Sbb1413: Sure, but from what I understand most viewers of media on Commons don't do it by way of categories. Let alone do they regularly interact with them in any meaningful way. So viewers don't really matter here.
I don't believe in this prejudice since I strongly believe in the "all men are created equal" I believe "all men are created equal" to, but not all men speak the same language and that's what we're talking about here. Not the inherent value of humans or whatever. To that end this Wikipedia article says that only 5% of people in Russia speak English to begin with. Change that to Siberia and the number of English speakers there is essentially non-exiting. The point being, you could take that map I linked to say British English must be the popular variation purely based on landmass, but then large areas of that land have almost no English speakers to begin with anyway. Like sure China is huge landmass and population wise, but conversely less then 1% of the population there speaks english to begin with. So the fact that British English is more popular then American in China is statistically and (more importantly) practically meaningless. --Adamant1 (talk) 05:38, 28 July 2024 (UTC)[reply]
@Adamant1: Do I understand you well: to decide what kind of English should be the standard, you only look at the contributors to Commons? Not to the whole world?
  • If I understand the Commons scope well, the category structure we make is not only for contributors, but for everybody who wants to find media with educational content, like Wikipedians and people contributing to other Wikimedia projects, scholars, school children and business people (and many more) looking for illustrations for their papers and presentations, from all over the world. So our target group is much broader than just Commons contributors. And I think we should include and consider the whole world to decide what kind of English should be the standard on Commons.
  • Categories do matter to viewers, because I think the search engine includes category names in search results (the Commons search engine as well as Google Image). And Wikidata is to make sure that spelling variants are included. And if not: I hope they do or going to do so.
  • But I wonder whether we should want to have one variant of English as the standard at all on Commons. So: why is this part of the discussion even necessary? I like the current situation better in which we choose one variant per concept and use that variant throughout the category structure.
JopkeB (talk) 05:50, 28 July 2024 (UTC)[reply]
I think that's essentially my position. Like I said in the comment your responding to, less then 1% of people in China speaks English to begin with. Realistically the amount of people who view media from Commons is pretty low. Then if you consider people from China who use or interact with categories on a regular basis it's even lower then that. So I have no problem with considering the whole world to decide what kind of English should be the standard on Commons. That's literally what I'm doing here and the fact is that if you look at how many people actually speak English to begin with in places where British English is the predominant language it's to small to be meaningful.
That doesn't even account the percentage of computer users versus cell phone users globally either. 66% of the world's population has access to the internet, but then only 40% of that percentage use a computer and most people don't use or interact with folders on mobile. So I think we should be realistic about what we're actually talking about here. Instead of just acting like we should base this purely on inclusivity or whatever. There's nothing wrong with that, but there's no point in being inclusive when it comes to groups of people that clearly don't even exist on here in the first place. --Adamant1 (talk) 06:06, 28 July 2024 (UTC)[reply]
@Jmabel and Adamant1: Yes, India (largely) follows British English due to its colonial legacy. On one hand, I have seen some news media imposing British spellings in many organization names (like "Johnson Space Centre", "World Health Organisation"). On the other hand, I have seen some media using -ize and mdy format. The mdy format was introduced by the British, who formerly used it. The -ize spelling is either an instance of Americanization or is based on the OED spelling conventions. I generally use British English with OED and IUPAC spelling conventions except when naming Indian categories, where I would use hardline British spellings. Sbb1413 (he) (talkcontribsuploads) 04:48, 28 July 2024 (UTC)[reply]
@Sbb1413: Interesting. Am I correct to assume it largely depends on the situation? Like I assume someone who works at a call center for a company in the United States would probably prefer American English. Otherwise they would use British. Or am I wrong about that? --Adamant1 (talk) 05:07, 28 July 2024 (UTC)[reply]
  • JopkeB, I don't see this as such a big problem. We have not had much of a problem thus far with the setup of new categories. Subcategories can follow their parent or siblings as a precedent, those setting up specific new categories with a language distinction are likely to understand that. Or, we can fix anything that comes up later on. Explanation of categories can be done with header text and that easily supports multiple languages.
Where we have a problem is with renames. Particularly those deliberately enforcing language changes 'for consistency'. If we make it clear that we just don't do that then we should be good. This is especially the case for non-natives changing another language. If you don't have the knowledge, then don't edit it. Andy Dingley (talk) 20:32, 3 August 2024 (UTC)[reply]

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

New proposal

[edit]

 Question How do we go on with this discussion? How can we close it? Because I see:

  1. there is no consensus about the initial proposal by Sbb1413 with three/four options (including variant D = leave it as it is now, no alterations or/and additions to the Universality Principle).
  2. someone started a discussion about one standard variant of English on Commons, without even posing the question whether we want such a thing; and there is no consensus about one pervailing variant of English either.

--JopkeB (talk) 06:30, 28 July 2024 (UTC)[reply]

@JopkeB It is a terrible decision, but there are several pros and cons of using different English variants:
Similarly, there are several pros and cons of using a single English variant for a specific topic:
This is why my proposal is to use English variants for countries whose official/primary language is English. Other countries will follow the English variant used in the main topic category. Note that the Commonwealth of Nations (excluding some members who use different variants) and the European Union are treated as single countries for the purpose of this proposal. Both organizations officially use British English, so there are possibilities that the members (except some Commonwealth members) would follow that variant. The Netherlands is a member of the EU, so it can freely use the British variant. Sbb1413 (he) (talkcontribsuploads) 03:53, 2 August 2024 (UTC)[reply]
In fact, using multiple variant categories for one topic and single variant categories for another is a huge mess. In light of technical reasons, we can use a single English variant throughout a topic, with other variants as descriptions. For example, the description of Category:Organizations of the United Kingdom would be read as:
British English: Organisations of the United Kingdom
This is much better than renaming category names in consistent with the local English variant. Sbb1413 (he) (talkcontribsuploads) 04:06, 2 August 2024 (UTC)[reply]
Also, in the domain of space exploration, the description of Category:Astronauts from Russia would be read as:
English: Russian cosmonauts
Similarly, Category:Astronauts from China would be read as:
English: Chinese astronauts, also known as taikonauts.
--Sbb1413 (he) (talkcontribsuploads) 04:16, 2 August 2024 (UTC)[reply]
Thanks for the overview. I would like to add to "pros and cons of using a single English variant for a specific topic":
  • Not only "No problems with navigation templates" but "No problems with navigation in general", for all users, no matter what English variant is preferred by the searcher, just because of the consistency throughout a topic.
 Support I think your latest addition might be the key for a solution: using a single English variant throughout a topic, with other variants as descriptions.
--JopkeB (talk) 04:28, 2 August 2024 (UTC)[reply]
@Crouch, Swale, Laurel Lodged, Prototyperspective, and Gürbetaler: Those who had supported my proposal, what do you think of my newer proposal of using single English variant throughout a topic, with other variants as descriptions? Sbb1413 (he) (talkcontribsuploads) 05:02, 2 August 2024 (UTC)[reply]
More examples of ENGVAR descriptions are found at Category:Aluminium, Category:Apartment buildings, Category:Caesium, Category:Eggplant, Category:Sulfur etc. Sbb1413 (he) (talkcontribsuploads) 05:50, 2 August 2024 (UTC)[reply]
I think it's also fine. The issue that there need to be machine translated category titles depending on the set uselang. For such short phrases the latest MT tech is nearly always accurate even for smaller languages and one could try this first with some of the best-working languages like Spanish. This may also allow many more people to find the WMC pages if the search the Web in their own language and indexing works well. Prototyperspective (talk) 10:43, 2 August 2024 (UTC)[reply]

The issue that there need to be machine translated category titles depending on the set uselang. For such short phrases the latest MT tech is nearly always accurate even for smaller languages and one could try this first with some of the best-working languages like Spanish. This may also allow many more people to find the WMC pages if the search the Web in their own language and indexing works well.

I don't understand how it is related to the ENGVAR issue in Commons. A machine translator can translate any English text to another language, regardless of variant. Sbb1413 (he) (talkcontribsuploads) 11:31, 2 August 2024 (UTC)[reply]
I was just saying we're discussing the wrong thing basically, it doesn't matter which variant is used or if there is a standard naming because it can be easily machine translated. I support your proposal partly because the variant indeed doesn't matter and that's also why I don't care much about which of your proposals is implemented, it would be good to standardize/specify it and I think this translatability needs to be kept in mind when thinking about which option would be best and can also solve potential Cons like "people looking for the term in their own language may not find the category as well if another language variant is used". Prototyperspective (talk) 11:46, 2 August 2024 (UTC)[reply]
I've put this here, it's a bit tangential to this discussion: meta:Community Wishlist/Wishes/Add machine translated category titles on WMC. Prototyperspective (talk) 17:06, 4 August 2024 (UTC)[reply]
  • An inherent problem with the "Universality principle" is that it seems to assume that there is some sort of "standard" terminology used everywhere. There is not. Whatever variety of English is used, there are occasional terms that are different in different countries or regions, to the point that if we try to impose some version globally it can be a term that is not only not used but is completely unfamiliar to millions of people. Uniformity is a worthy goal as a general rule, but refusing to acknowledge occasional exceptions can create problems greater than any potential superficial appearance of benefit. -- Infrogmation of New Orleans (talk) 21:33, 3 August 2024 (UTC)[reply]
    "Standard terminology" has at least two meanings:
    1. Terminology used as a standard within a (geographic or cultural) community. Example: spelling of English words in different parts of the world.
    2. Terminology used as a standard within a system. In a system it does not matter how words are spelled, as long as the same concept is spelled the same way throughout that system. You can even give them codes instead of real words (like in Wikidata), as long as you add good descriptions in languages that are understandable by people. For me the category structure in Commons is such a system. And though I was not involved at all in the creating process of the principles, I think the Universality Principle was established for this reason. In Commons this principle is at least necessary to have "No problems with navigation". but also for communicating with other systems and for technical solutions (like templates), current and future ones. I guess this is the reason why Sbb1413 proposed to (In light of technical reasons, we can) "use a single English variant throughout a topic, with other variants as descriptions", which I support.
    That "a term that is not only not used but is completely unfamiliar to millions of people" is not relevant in a system, it is only relevant within a community. I (not a native English speaker) consider myself as one of those millions people (should perhaps be "billions") and I do not mind having to look up English terms used in Commons on a daily basis. I do not see why it is a problem for native English speakers to look up a word only now and then. JopkeB (talk) 05:30, 4 August 2024 (UTC)[reply]
    Yes, organization on Commons has become massively more US/England English centric than I thought the intention was in the early months of the project. Even the use of standard Latin names for plants and animals, one of the few non-English standards initially agreed to at the start, is widely disregarded by native English speakers refusing to look up a word only now and then. -- 17:23, 5 August 2024 (UTC) — Preceding unsigned comment added by Infrogmation (talk • contribs) ‎ (UTC)
    @Infrogmation: , you are not correct. The Universality Principle does NOT assume any sort of 'standard' terminology used anywhere outside of Commons. It merely determines that a common terminology is to be used within Commons. Whatever term is used, it is known that it will be unfamiliar to not only millions, but billions of people, since there are at least 6 billion people who do not speak English. It does not matter whether we use universal terminology or localized terminology--either way most people will not have a native understanding of the term. The only question is whether users only have to familiarize themselves with a single term for a topic usable across all of Commons, or they have to learn multiple different terms for the same topic to be able to navigate across Commons. Josh (talk) 19:28, 9 August 2024 (UTC)[reply]
One of the fundamental problems I have with the localized system is that it presumes that the audience for a topic by region is only from that region and thus the name of the topic should be catered to that specific population. However, topics are there for all people to access, and the audience for every topic is global. For local audiences, 'trucks in Texas', 'lorries in England', and 'camiones en Mexico' all may be the more readily understood terminology for most people living in those regions, and if that were the only users these categories were intended for, I might agree with localized naming. However, they are not--they are intended to be accessible by all users regardless of where they live. Thus the UP was developed to promote access by all users to all content without prejudice. Josh (talk) 22:59, 9 August 2024 (UTC)[reply]
@Joshbaumgartner I have agreed to use a consistent English variety for a given topic, regardless of country. If we want to add national English variants in a country category, we can do so using specialized description templates (like {{En-gb}}, {{En-us}}). The plain {{En}} template can also assume the local dialect, like Category:Astronauts from the Soviet Union. Sbb1413 (he) (talkcontribsuploads) 03:49, 10 August 2024 (UTC)[reply]
@Sbb1413 I agree 100% that language and language-variant specific descriptions are a great thing to add to any category. The best way is through WD, but for categories without {{Wikidata infobox}}, the templates you mentioned are very good to use. I support their use completely! Of course I would most love if we could fix category names so they were handled like WD labels, and everyone can search for and have displayed their language of choice as the category name, but that is a technical solution that we have failed to achieve in more than a decade, so I'm not holding my breath. Universality Principle is how we make do in the meantime. Josh (talk) 01:40, 11 August 2024 (UTC)[reply]
Now we'll wait for another two weeks for more inputs. After that, we can close this discussion and tackle with the categories potentially violating the Universality Principle. Sbb1413 (he) (talkcontribsuploads) 18:15, 11 August 2024 (UTC)[reply]
The categories potentially violating the Universality Principle include the subnational categories of Category:Organizations, and the national and subnational categories of Category:Train stations. According to my proposal, they all should follow consistent naming throughout the topic, and descriptions should be added to those categories to indicate the local dialects. This will allow the category name "Train stations in West Bengal" with Indian English description "Railway stations in West Bengal". Sbb1413 (he) (talkcontribsuploads) 18:20, 11 August 2024 (UTC)[reply]
I have also formally withdrawn my original proposal, since the proposal isn't feasible and it may go against the spirit of COM:LANG. --Sbb1413 (he) (talkcontribsuploads) 18:27, 11 August 2024 (UTC)[reply]
@Sbb1413 I actually would like to see the multilingual descriptions (including multi-variant) on more than just the specific region. I think it is good that on a Mexican topic, there is a Latin American Spanish description available, but I'd frankly like to see Spanish descriptions on Mongolian topics too. That is why I like using WD as it does a good job of maintaining multi-lingual labels and descriptions and making them easy to deploy on pages. Of course, all of that requires doing by people familiar with the languages in question. I agree with implementing in organizations and train stations, though for large topics with a strong variety of terminology use throughout them, they may warrant a topic-specific CfD to determine which of the various terms really should be adopted by Commons as the topic name. Josh (talk) 19:10, 11 August 2024 (UTC)[reply]
@Joshbaumgartner, Infrogmation, and Prototyperspective: Since most users have agreed with my new proposal, it might be best to move on. It is not a policy change, but a mere suggestion to address the ENGVAR issues in Commons. So far, only Crouch, Swale has problems following this proposal, as they have wished that the category names should follow national variants, like in Wikipedia. But unlike Wikipedia, which focuses mostly on articles, Commons focuses mostly on categories and files. So following a consistent variant across the entire topic hierarchy makes sense, with descriptions in national variants. Laurel Lodged has made no comments regarding the new proposal, but supported the original proposal. Sbb1413 (he) (talkcontribsuploads) 03:13, 13 August 2024 (UTC)[reply]
@Crouch, Swale and Laurel Lodged: Do you still support my original proposal of using English variants in category names? I believe my new proposal will address the issues of English variant issues with category names. TL;DR: my current proposal is to use description templates of different English variants instead of having category names of different variants. This is more in line with the Universality Principle, which is one of the actual policies of Commons categories. w:WP:C2C is applicable for English Wikipedia categories, since we don't have criteria for speedy renaming of categories. Commons categories and English Wikipedia categories are different in many aspects, despite sharing the same "Category" namespace. Sbb1413 (he) (talkcontribsuploads) 10:00, 3 September 2024 (UTC)[reply]
Yes. Crouch, Swale (talk) 10:01, 3 September 2024 (UTC)[reply]
Yes. Laurel Lodged (talk) 16:50, 3 September 2024 (UTC)[reply]

When to {{Categorise}}?

[edit]

Following this reversion correcting my misuse of {{Categorise}} I need help to interpret its ideal usage. Given that this template warns that «This is a main category requiring frequent diffusion and maybe maintenance. As many pictures and media files as possible should be moved into appropriate subcategories», how is it correct to use it to tag fairly obscure categories with very few elements (files and subcats) in them? -- Tuválkin 14:07, 18 August 2024 (UTC)[reply]

I'm not sure I agree with Verdy p here, but I see what he's after: its hard to imagine an image that legitimately belongs directly in that category. - Jmabel ! talk 16:50, 18 August 2024 (UTC)[reply]
There’s {{Catcat}} for that or {{Metacat}}, if warranted. Since the template explanation says «This is a main category» one thinks it is rather meant for those cats where thousands of files are dumped into every day, like Category:Paris, which need a brigade of editors dissiminating them around constantly. Not for something so obscure like Category:SI unit symbols (and the others where User:Verdy p has been adding {{Categorise}} to in recent weeks). -- Tuválkin 10:35, 20 August 2024 (UTC)[reply]
This is not just images but any medias and any relevant categories containing them. The "diffusion" criteria is important there, as well as its "permanence", not the criteria of "frequent" maintenance, which remains permanent, and this is what is really tracked there by the associated category. And this applies to "as many as possible" contents (which should be properly categorised (avoiding also double categorisation in a parent category when its more relevant and more precise child category is enough: this also avoids maintenance on media files as well to recategorize them and helps media uploaders). verdy_p (talk) 16:52, 18 August 2024 (UTC)[reply]
Oh, you have a reason for it, after all? You should have put it in the edit summary when I asked, several times, instead of just reverting. If your goal is to annoy me, then mission accomplished — although you have been doing so since 2005, across multiple online projects. Maybe avoid me, Philippe, instead of seeking my proximity as you did recently at {{Lettercombo}}. I find your whole demenour oddly disquieting and Commons should be large enough for us two not to bump elbows ever. -- Tuválkin 11:16, 20 August 2024 (UTC)[reply]

Exceptions to the Selectivity Principle

[edit]

While the Selectivity Principle prevents us from creating categories that combine multiple topics, there are some cases where we should ignore this rule: Oceans vs. Seas, and Hills vs. Mountains. Although oceans and seas are distinct topics, there are many aspects that are equally applicable to both, so I had created Category:Oceans and seas for them. For hills vs. mountains, the distinction is mostly arbitrary and subjective (e.g. the elevations of Hills of Darjeeling district are similar to other Himalayan mountains), so I will create Category:Hills and mountains once there's a consensus for making exceptions to this principle. Sbb1413 (he) (talkcontribsuploads) 16:23, 31 August 2024 (UTC)[reply]

Although the above reasoning may seem to go against COM:Intersectional categories, the topics mentioned are not really distinct. While oceans are extremely large compared to seas, they have many features common to both, like coasts (including sea beaches), marine biology, etc. On the other hand, although both hills and mountains are Slope landforms, we have also categorized Plains under this category, so Category:Slope landforms is not suitable to group hills and mountains. Category:Highlands is also not suitable, since it can cover multiple mountain ranges and plateaus, as opposed to individual mountains/hills. Sbb1413 (he) (talkcontribsuploads) 16:31, 31 August 2024 (UTC)[reply]
So what would the exception exactly be? How would you describe it in general? JopkeB (talk) 03:43, 1 September 2024 (UTC)[reply]
I think the exception is where there is a quite clear concept, but English doesn't happen to have a single term. Sometimes we fudge this a different way: e.g. "rivers" categories often also include things too small to be properly considered a "river", and might more properly be called "rivers, creeks, and streams", but we just say "rivers" in the cat name. - Jmabel ! talk 15:35, 1 September 2024 (UTC)[reply]
@Jmabel: This is what I want to say. However, for the case of rivers, we already have Category:Watercourses, which can cover rivers, streams, creeks, and canals together. Sbb1413 (he) (talkcontribsuploads) 15:39, 1 September 2024 (UTC)[reply]
And is an obscure word probably not known to even the average native-speaking college graduate, certainly not to the average native speaker without a "higher" education. (In fact, I just ran that by my girlfriend to see if she agreed, and it turned out that she didn't know the word herself!) - Jmabel ! talk 15:49, 1 September 2024 (UTC)[reply]