I Use This!
Activity Not Available

News

Analyzed about 1 year ago. based on code collected about 1 year ago.
Posted about 10 years ago by ajvincent
Bugzilla bug report #1,100,000 We used to have a little cheering for every 100,000 bug reports filing.  Bugzilla hasn’t keeled over yet! But somehow, in the aftermath of the megabug, I think we forgot to plan for this one.  Oops! It looks like it ... [More] took us about 19 months to get here from the previous one.  I’ll leave it to Gervase Markham to dig up the appropriate statistics. [Less]
Posted about 10 years ago by jmorrow
We’ve just launched a cleanup of our login and account creation dialog and reddit.com/login. Here’s a comparison between the old version and new version: Props to new engineer aurora-73 for […]
Posted about 10 years ago
The probably oldest complaint about TLS is that its handshake is slow and together with the transport encryption has a lot of CPU overhead. This certainly is not true anymore if configured correctly (even if some companies choose to ignore that). ... [More] One of the most important features to improve user experience for visitors accessing your site via TLS is session resumption. Session resumption is the general idea of avoiding a full TLS handshake by storing the secret information of previous sessions and reusing those when connecting to a host the next time. This drastically reduces latency and CPU usage. Enabling session resumption in web servers and proxies can however easily compromise forward secrecy. To find out why having a de-factor standard TLS library (i.e. OpenSSL) can be a bad thing and how to avoid botching PFS let us take a closer look at forward secrecy, and the current state of server-side implementation of session resumption features. What is (Perfect) Forward Secrecy? (Perfect) Forward Secrecy is an important part of modern TLS setups. The core of it is to use ephemeral (short-lived) keys for key exchange so that an attacker gaining access to a server cannot use any of the keys found there to decrypt past TLS sessions they may have recorded previously. We must not use a server’s RSA key pair, whose public key is contained in the certificate, for key exchanges if we want PFS. This key pair is long-lived and will most likely outlive certificate expiration dates as you would just use the same key pair to generate a new certificate after the current expired. In case the server is compromised it would be far too easy to determine the location of the private key on disk or in memory and use it to decrypt recorded TLS sessions from the past. Using Diffie-Hellman key exchanges where key generation is a lot cheaper we can use a key pair exactly once and discard it afterwards. An attacker with access to the server can still compromise the authentication part as shown above and {M,W}ITM everything from here on using the certificate’s private key, but past TLS sessions stay protected. How can Session Resumption botch PFS? TLS provides two session resumption features: Session IDs and Session Tickets. To better understand how those can be attacked it is worth looking at them in more detail. Session IDs In a full handshake the server sends a Session ID as part of the “hello” message. On a subsequent connection the client can use this session ID and pass it to the server when connecting. Because both server and client have saved the last session’s “secret state” under the session ID they can simply resume the TLS session where they left off. To support session resumption via session IDs the server must maintain a cache that maps past session IDs to those sessions’ secret states. The cache itself is the main weak spot, stealing the cache contents allows to decrypt all sessions whose session IDs are contained in it. The forward secrecy of a connection is thus bounded by how long the session information is retained on the server. Ideally, your server would use a medium-sized cache that is purged daily. Purging your cache might however not help if the cache itself lives on a persistent storage as it might be feasible to restore deleted data from it. An in-memory storage should be more resistant to these kind of attacks if it turns over about once a day and ensures old data is overridden properly. Session Tickets The second mechanism to resume a TLS session are Session Tickets. This extension transmits the server’s secret state to the client, encrypted with a key only known to the server. That ticket key is protecting the TLS connection now and in the future and is the weak spot an attacker will target. The client will store its secret information for a TLS session along with the ticket received from the server. By transmitting that ticket back to the server at the beginning of the next TLS connection both parties can resume their previous session, given that the server can still access the secret key that was used to encrypt. We ideally want the same secrecy bounds for Session Tickets as for Session IDs. To achieve this we need to ensure that the key used to encrypt tickets is rotated about daily. It should just as the session cache not live on a persistent storage to not leave any trace. Apache configuration Now that we determined how we ideally want session resumption features to be configured we should take a look at a popular web servers and load balancers to see whether that is supported, starting with Apache. Configuring the Session Cache The Apache HTTP Server offers the SSLSessionCache directive to configure the cache that contains the session IDs of previous TLS sessions along with their secret state. You should use shmcb as the storage type, that is a high-performance cyclic buffer inside a shared memory segment in RAM. It will be shared between all threads or processes and allow session resumption no matter which of those handles the visitor’s request. SSLSessionCache shmcb:/path/to/ssl_gcache_data(512000) The example shown above establishes an in-memory cache via the path /path/to/ssl_gcache_data with a size of 512 KiB. Depending on the amount of daily visitors the cache size might be too small (i.e. have a high turnover rate) or too big (i.e. have a low turnover rate). We ideally want a cache that turns over daily and there is no really good way to determine the right session cache size. What we really need is a way to tell Apache the maximum time an entry is allowed to stay in the cache before it gets overridden. This must happen regardless of whether the cyclic buffer has actually cycled around yet and must be a periodic background job to ensure the cache is purged even when there have not been any requests in a while. You might wonder whether the SSLSessionCacheTimeout directive can be of any help here - unfortunately no. The timeout is only checked when a session ID is given at the start of a TLS connection. It does not cause entries to be purged from the session cache. Configuring Session Tickets While Apache offers the SSLSessionTicketKeyFile directive to specify a key file that should contain 48 random bytes, it is recommended to not specify one at all. Apache will simply generate a random key on startup and use that to encrypt session tickets for as long as it is running. The good thing about this is that the session ticket key will not touch persistent storage, the bad thing is that it will never be rotated. Generated once on startup it is only discarded when Apache restarts. For most of the servers out there that means they use the same key for months, if not years. To provide forward secrecy we need to rotate the session ticket key about daily and current Apache versions provide no way of doing that. The only way to achieve that might be use a cron job to gracefully restart Apache daily to ensure a new key is generated. That does not sound like a real solution though and nothing ensures the old key is properly overridden. Changing the key file while Apache is running does not do it either, you would still need to gracefully restart the service to apply the new key. An do not forget that if you use a key file it should be stored on a temporary file system like tmpfs. Disabling Session Tickets Although disabling session tickets will undoubtedly have a negative performance impact, for the moment being you will need to do that in order to provide forward secrecy: SSLOpenSSLConfCmd Options -SessionTicket Ivan Ristic adds that to disable session tickets for Apache using SSLOpenSSLConfCmd, you have to be running OpenSSL 1.0.2 which has not been released yet. If you want to disable session tickets with earlier OpenSSL versions, Ivan has a few patches for the Apache 2.2.x and Apache 2.4.x branches. To securely support session resumption via tickets Apache should provide a configuration directive to specify the maximum lifetime for session ticket keys, at least if auto-generated on startup. That would allow us to simply generate a new random key and override the old one daily. Nginx configuration Another very popular web server is Nginx. Let us see how that compares to Apache when it comes to setting up session resumption. Configuring the Session Cache Nginx offers the ssl_session_cache directive to configure the TLS session cache. The type of the cache should be shared to share it between multiple workers: ssl_session_cache shared:SSL:10m; The above line establishes an in-memory cache with a size of 10 MB. We again have no real idea whether 10 MB is the right size for the cache to turn over daily. Just as Apache, Nginx should provide a configuration directive to allow cache entries to be purged automatically after a certain time. Any entries not purged properly could simply be read from memory by an attacker with full access to the server. You guessed right, the ssl_session_timeout directive again only applies when trying to resume a session at the beginning of a connection. Stale entries will not be removed automatically after they time out. Configuring Session Tickets Nginx allows to specify a session ticket file using the ssl_session_ticket_key directive, and again you are probably better off by not specifying one and having the service generate a random key on startup. The session ticket key will never be rotated and might be used to encrypt session tickets for months, if not years. Nginx, too, provides no way to automatically rotate keys. Reloading its configuration daily using a cron job might work but does not come close to a real solution either. Disabling Session Tickets The best you can do to provide forward secrecy to visitors is thus again switch off session ticket support until a proper solution is available. ssl_session_tickets off; HAproxy configuration HAproxy, a popular load balancer, suffers from basically the same problems as Apache and Nginx. All of them rely on OpenSSL’s TLS implementation. Configuring the Session Cache The size of the session cache can be set using the tune.ssl.cachesize directive that accepts a number of “blocks”. The HAproxy documentation tries to be helpful and explain how many blocks would be needed per stored session but we again cannot ensure an at least daily turnover. We would need a directive to automatically purge entries just as for Apache and Nginx. And yes, the tune.ssl.lifetime directive does not affect how long entries are persisted in the cache. Configuring Session Tickets HAproxy does not allow configuring session ticket parameters. It implicitly supports this feature because OpenSSL enables it by default. HAproxy will thus always generate a session ticket key on startup and use it to encrypt tickets for the whole lifetime of the process. A graceful daily restart of HAproxy might be the only way to trigger key rotation. This is a pure assumption though, please do your own testing before using that in production. Disabling Session Tickets You can disable session ticket support in HAproxy using the no-tls-tickets directive: ssl-default-bind-options no-sslv3 no-tls-tickets A previous version of the post said it would be impossible to deactivate session tickets. Thanks to the HAproxy team for correcting me! Session Resumption with multiple servers If you have multiple web servers that act as front-ends for a fleet of back-end servers you will unfortunately not get away with not specifying a session ticket key file and a dirty hack that reloads the service configuration at midnight. Sharing a session cache between multiple machines using memcached is possible but using session tickets you “only” have to share one or more session ticket keys, not the whole cache. Clients would take care of storing and discarding tickets for you. Twitter wrote a great post about how they manage multiple web front-ends and distribute session ticket keys securely to each of their machines. I suggest reading that if you are planning to have a similar setup and support session tickets to improve response times. Keep in mind though that Twitter had to write their own web server to handle forward secrecy in combination with session tickets properly and this might not be something you want to do yourselves. It would be great if either OpenSSL or all of the popular web servers and load balancers would start working towards helping to provide forward secrecy by default and server admins could get rid of custom front-ends or dirty hacks to rotate keys. [Less]
Posted about 10 years ago
31 changesets 56 files changed 920 insertions 250 deletions ExtensionOccurrences cpp15 js8 html4 h4 c3 xul2 mn2 list2 ini2 css2 build2 xml1 xhtml1 svg1 sjs1 mm1 jsm1 java1 conf1 ModuleOccurrences browser16 layout9 dom6 content5 netwerk4 js3 ... [More] gfx3 mobile2 xpfe1 widget1 media1 intl1 editor1 docshell1 List of changesets: Matthew GreganBug 1085175. r=roc, a=dveditz - 9cd882996cbe Jordan SantellBug 1078539 - Add a doorhanger widget for the developer edition notification to browser console, developer tools, webide and responsive design mode. r=jryans, a=lmandel - e7f8aa528841 James WillcoxBug 1097126 - Restrict MediaCodec backend to Android 5.0 and higher r=blassey a=lmandel - 7dfbe52d1a2b James WillcoxBacked out changeset 7dfbe52d1a2b a=lmandel - b0fea8a116aa James WillcoxBug 1097276 - Disable MediaCodec backend r=edwin a=lmandel - cd70fa61662a Chenxia LiuBug 1093619 - Don't display onboarding screen for Webapp profiles. r=margaret, a=lmandel - 5cbc59a67d8c James WillcoxBug 1097276 - Disable fragmented MP4 support on Android r=rillian a=lmandel - f7dd649eb2f6 Richard NewmanBug 1095298 - Ignore 'C' locale when initializing nsLocaleService on Android. r=emk, a=lmandel - 43fd2720be09 Olli PettayBug 1096263 - XMLHttpRequest.send({}) should not throw. r=bz, a=lmandel - 9e57cec588a9 Olli PettayBug 1096263 - XMLHttpRequest.send({}) should not throw, tests. r=bz, a=lmandel - 0197e9eb324f Botond BalloBug 1068961 - Reset clip rect for color layers. r=roc, a=lmandel - 9f14f2af8bf7 Robert O'CallahanBug 1084672 - Call NotifyDidPaint from the refresh driver to ensure it gets called regardless of whether OMTC is used or not. r=mattwoodrow, a=lmandel - 2b08e1cb3c6f Matthew GreganBug 1096716 - Delay buffer frame calculation in WMF audio decoder until after UpdateOutputType to avoid using obsolete rate/channel values. r=cpearce, a=lmandel - 49f10dbc7d69 Boris ZbarskyBug 1090616 - Don't assume that the nodes we just pasted are still in the DOM, because mutation listeners suck like that. r=smaug, a=lmandel - 609915862295 Steven MichaudBug 1092855 - Work around bad interaction between jemalloc and Apple uninitialized memory bug. r=spohl, a=lsblakk - 4bdf71e69d10 Matthew GreganBug 1092859 - Always use soft volume in WinMM cubeb backend. r=padenot, a=lmandel - f2dd9f2a084a Steven MichaudBug 1086977 - Facebook's old "Facebook Photo Uploader" (fbplugin) crashes on load on OS X Yosemite. r=bsmedberg a=lmandel - e588ff4e326e Neil RashbrookBug 1070768 - Fix up more references to XPFE's autocomplete.css. r=Ratty, a=lmandel - aa474c125c53 Gijs KruitboschBug 1096787 - Copy new logins.json database when using fx reset. r=MattN, a=lsblakk - b718e8c0d423 Randell JesupBug 1080312 - Update iteration code from upstream. r=jesup, a=abillings - 4bb1c6116c39 Gijs KruitboschBug 1096695 - hardcode strings for beta to give more info about sslv3 being dead, r=bz,gavin, a=lmandel - d585e4e50468 Gijs KruitboschBug 1096695 - fix test failures in test_aboutCrashed.xul, rs=bustage,a=bustage - 117eb4e49c72 Bas SchoutenBug 1096913 - Remove the simple cache entry when replacing the cache entry. r=mwu, a=lmandel - e1ee2331bd12 Kannan VijayanBug 1081850 - Check for resolve hooks when optimizing no-such-prop operations in baseline. r=jandem, a=lmandel - e685be9bd4d6 Mike HommeyBug 1096651 - Avoid using random bits when determining SSE3/SSE4 availability for the JIT. r=luke, a=lmandel - 3e5cb63660bd L. David BaronBug 1086937 - Patch 0: Add missing null check of root element so this patch series doesn't expose a crash in layout/style/crashtests/472237-1.html. r=birtles, a=lmandel - 4ccd3e117f5d Robert O'CallahanBug 1092842 - When setting cliprects on background color display items, don't shrink them to exclude opaque borders (unless there's nonzero border-radius). r=mattwoodrow, a=lmandel - 19296c34b1ca Robert O'CallahanBug 1097437 - Work around Quartz bug where corners of stroked rects don't get a solid color when they should. r=jrmuizel, a=lmandel - 9e4c3c78fe01 Steven MichaudBug 1017595 - Links from external applications sometimes fail to open when Firefox is hidden. r=spohl, a=lmandel - 89c3e0133233 Robert O'CallahanReftest manifest changes that were incorrectly landed as part of Bug 1096181, but should have been part of Bug 1097437 or Bug 1092842. a=orange - c486cd17bebb Jordan SantellBug 1078539 - Disable dev edition promo banner in beta temporarily. r=jryans, a=lmandel - 1242fc159d04 [Less]
Posted about 10 years ago by Yunier J
Hace poco días tuvimos el placer de compartir con ustedes una importante noticia para el futuro de Firefox. Me refiero a las mejoras en velocidad al ejecutar Java Script del motor de nuestro navegador favorito. Para ese artículo estaba esperando la ... [More] respuesta de uno de sus desarrolladores, y dada a la importancia de la buena nueva se publicó. Pregunta (P). La última pregunta de nuestra conversación anterior estuvo relacionada con la puntuación en Octane y hoy han alcanzado su objetivo, ya somos los primeros. ¿Qué aspectos de Ion Monkey han mejorado para alcanzar esta increíble velocidad al ejecutar Java Script (JS)? Respuesta (R)./ Hemos realizado muchas mejoras a Ion Monkey pero también a otras partes del motor JS. Habilitar la Recolección Generacional de Basura nos ha ayudado mucho con respecto a Octane. En este artículo publicado por Nicholas Nethercote se explica con más detalle esta característica. También se han aplicado optimizaciones menores que ayudan a obtener mejores puntuaciones. P. ¿Cómo impactan estos cambios en los usuarios finales de Firefox ? R./ Estos cambios harán que muchas cosas vayan muy rápido, principalmente cuando se ejecuten aplicaciones escritas en JS que requieran alto procesamiento de CPU. Ejemplo de esto son los juegos, emuladores y pdf.js, donde el rendimiento JS es sumamente importante. P. ¿Podrían añadir pruebas en AWFY sobre Linux? Porque las plataformas actuales solo incluyen Windows, Mac, Android y Firefox OS. R./ Sí podríamos añadir Linux!, pienso que las puntuaciones en Linux deben parecerse mucho a las alcanzadas en Mac. En un futuro añadiremos esta plataforma. P. ¿Qué significan Firefox (Shell) y Firefox (PGO) en AWFY? R./ Firefox (Shell) es el shell de JS, una pequeña aplicación de línea de comandos independiente que los desarrolladores de SpiderMonkey usan. Es muy útil pues no tenemos que compilar el navegador completo cuando trabajamos en el motor JS. Puedes encontrar más información al respecto en la página del proyecto. Firefox (PGO) es una versión compilada para Windows basada en perfil de optimización guiado. Todas nuestras versiones oficiales usan PGO. Firefox compilado con PGO es más rápido que sin PGO en muchos casos. P. Es maravilloso que Firefox ahora vaya de primero en la velocidad como hace varios años. Esto representa un regalo por sus 10 años, dime cómo se sienten en Mozilla con este logro. R./ Nosotros estamos muy emocionados con todas nuestras mejoras en rendimiento porque nos tomó mucho esfuerzo conseguirlo. Pero no pararemos aquí, seguiremos encontrando nuevas vías para hacer Firefox más rápido. Jan de Mooij: Muchas gracias por escribir acerca de lo que hacemos en Mozilla. Por ahora sólo nos queda seguir la evolución del motor como el año anterior y esperar que avance aún más. Siempre con la confianza puesta en Mozilla y sus desarrolladores que nunca nos han defraudado. Visitar Are We Fast Yet? [Less]
Posted about 10 years ago by patrickfinch
Many people don’t know/didn’t realise/don’t care/already know/already know and didn’t care, but I am back full time with Mozilla after a hiatus of a number of months.  In fact, I jumped at the chance to join Mozilla’s Content Services team and after ... [More] a fairly short conversation with Darren, Mozilla VP of Content Services, I knew it was what I wanted to do. I feel very happy to be back, and I wanted to put a few thoughts on paper about my motivation.  I think back to a conversation I had earlier this year in Barcelona with Alina, someone who always expands my thinking.  I believe in the web.  I believe in Mozilla’s efforts to maintain its importance in how the Internet is developed and consumed.  I believe it is entirely preferable to the reemergence of technological islands, walled gardens, empires, (call them what you will).  And yet, from a different perspective, isn’t the web evidently facilitating such empires anyway? Taking it further: the web, this incredible creation that has enriched our lives in ways we could not have imagined for ourselves previously, is also an agent of economic and cultural imperialism, in the same way that free trade and liberalised capital markets arguably have been in the 1980s and 1990s?  I realise that many Mozillians will have an inherent faith in market solutions and I certainly believe in free trade…up to a point. People will identify Content Services with advertising.  And how do I feel about advertising in general?  About the same way as when I first read J.K. Galbraith on the subject, over 20 years ago. Advertising troubled Galbraith, even in 1958 when he first published The Affluent Society.  Advertising gives cultural force to the means of production.  Or as he put it, “wants are increasingly created by the process by which they are satisfied”.  That is, advertising is the means by which supply creates demand.  It allows capital to influence our psyche and creates new cultural barriers to market entry.  In 2014, it’s hard to imagine an economy without advertising.  And while I do not entirely share Naomi Klein’s wholly negative views on what brands mean, I do find that idea that so many of our cultural signifiers are created with the purpose of persuading us to consume x in preference to y to be more than a little uncomfortable. Given that perspective, why Content Services?  Well, Content Services is that it is not all about advertising.  Content Services will help deliver Mozilla a new voice for its community and with its users.  But saying that, the most important thing for Content Services is advertising.  That is because advertising is the most important economic activity on the web – by a long way. Look at what advertising has enabled on the Web.  How much content is free to consume as a consequence of federated advertising?  Many Firefox user choose to block adverts, and other people round on those users for failing to honour the implicit contract between the publisher and the reader.  I am not sure I subscribe to that point of view entirely, but to fail to view advertising as an exchange of value between the user and the website is to be disingenuous.  And it is something that the web does extraordinarily well – and at scale, and on an aggregated basis.  It has empowered the user, and the search engine, the social network, and it has wreaked havoc on publishers. Almost every user of the web enjoys a huge consumer surplus.  That is, they would pay far more for the web than it costs them (once you’ve paid for access to the network, you “pay” almost nothing to use the web).  And some consumers enjoy a much larger surplus than others.  That is, richer consumers, who have a higher propensity to pay, transmit an effective subsidy to poorer consumers who would have a lower propensity to pay.  And this aggregated arbitrage is a good thing. Generally. Except that is has given rise to incalculably powerful and valuable empires.  These empires might be defined as the ones who own the user’s identity.  The ones you log in to, who know the most about you, about what you do, who you know and what you read.  Empires which extract value from all of the industries they touch, from online publishers to mobile network operators and ISPs.  I must stress none of this is a criticism of a Google or a Facebook: they have delivered huge utility and have successfully exploited their market positions.  But it is notable that a company with no significant patents or copyrights, nor indeed revenues, and which employed a reported 55 people could be valued at $19Bn. It is reasonable to suppose, under such circumstances, that there are major externalities generated by this business, or that this business is a significant free rider, which almost all internet business are: something those of use who support net neutrality implicitly agree is a good thing (I do not intend the term pejoratively). What are the externalities?  As we’re fond of telling each other, if you’re not paying, you’re the product.  The price we pay is our attention and exposure to adverts, and knowledge about ourselves.  We are being broken down, analysed, reassembled as segments, profiles, tracked as individuals and as sub-markets and, yes, being spied upon.  Some people are relaxed about this, perhaps feeling that they have nothing to hide, and besides, they haven’t even come for the Socialists yet… What’s more, the cultural impact is abysmal.  In the old world, advertising inventory was finite, confined to the space for adverts on billboards, in newspapers and so on.  When the Mozilla community created its iconic New York Times advert, it was an incredible demonstration of the power of a community – placing such an advert cost real money.  But in the online world, inventory is flexible, and theoretically infinite.  You can grow your inventory by retaining users, get getting more clicks.  And you do this by writing clickbait headlines, by instrumenting your users, by taking your publication downmarket, by designing your site so that one article extends over multiple webpages, and so on and so forth.  The effects on our culture are obvious.  The effects on our psyche, we’re only just starting to understand. And then, there is a battle over online privacy.  Or perhaps, more aptly, an arms race.  Privacy measures are met with counter-measures.  Tracking cookies, which still seem important today, may seem positively anodyne in years to come.  The more intimacy we gain with the internet, and the more capabilities it assumes, the deeper and deeper this problem becomes. So, where does all this leave us?  Well, there is another Mozillian I would mention who has frequently inspired me: former CEO John Lilly.  Almost exactly four years ago, John gave a talk for the House of Commons, in which he presciently suggested that just as troubling as the Orwellian aspects of the internet are, so to should be aware of the dangers of a culture that is amusing us into bovine submission. John is a man who reads books, and as he points out, the internet is as much Brave New World as 1984.  And John also spelled out the importance for mass participation in the creation of counter-measures to this.  Actually, just go and read his post again, if you haven’t already. Advertising on the web is a problem, it risks trivialising our culture, creating a mass surveillance system and is supporting new forms of digital empires.  And yet, it is better than the alternatives: all this economic value being pushed to proprietary technology platforms.  And it is in danger: it is in danger of being unpalatable in a modern democracy, and of being superseded by proprietary technologies with even worse consequences.  That is why Mozilla has to act, and why it is entirely appropriate that we involve ourselves in this industry.  It is why we should conceive and build solutions to these problems, and look to empower all parts of the internet ecosystem that generate value for the consumer.  This problem is our problem.  We must not just try to wish it out of existence. Our first duty is clear: it is to the Firefox user, and the trust they have in Mozilla.  It would not be right that we would send our users to whichever service on the internet and rule out-of-scope the consequences for them (and nor do we).  We build Firefox users the tools to be in charge of their experience.  But we must help instantiate the rest of the world we want to see, bringing advertisers and publishers who share these values into the Mozilla community.  We will understand their needs, and where they are transparent, where they scale and support heterogeneity, where they offer a reasonable, knowable and workable exchange of value, we should finds ways to facilitate them.  Until that happens, the concentration of power on the internet will only continue.  And honestly, who else but Mozilla is going to address this problem? And more important to me than any of this, is to be working side-by-side again with my many wonderful friends at Mozilla. [Less]
Posted about 10 years ago by Daniel Stenberg
(Recap: I founded the curl project, I am still the lead developer and maintainer) When asking curl to get a URL it’ll send the output to stdout by default. You can of course easily change this behavior with options or just using your shell’s redirect ... [More] feature, but without any option it’ll spew it out to stdout. If you’re invoking the command line on a shell prompt you’ll immediately get to see the response as soon as it arrives. I decided curl should work like this, and it was a natural decision I made already when I worked on the predecessors during 1997 or so that later would turn into curl. On Unix systems there’s a common mantra that “everything is a file” but also in fact that “everything is a pipe”. You accomplish things on Unix by piping the output of one program into the input of another program. Of course I wanted curl to work as good as the other components and I wanted it to blend in with the rest. I wanted curl to feel like cat but for a network resource. And cat is certainly not the only pre-curl command that writes to stdout by default; they are plentiful. And then: once I had made that decision and I released curl for the first time on March 20, 1998: the call was made. The default was set. I will not change a default and hurt millions of users. I rather continue to be questioned by newcomers, but now at least I can point to this blog post! :-) About the wget rivalry As I mention in my curl vs wget document, a very common comment to me about curl as compared to wget is that wget is “easier to use” because it needs no extra argument in order to download a single URL to a file on disk. I get that, if you type the full commands by hand you’ll use about three keys less to write “wget” instead of “curl -O”, but on the other hand if this is an operation you do often and you care so much about saving key presses I would suggest you make an alias anyway that is even shorter and then the amount of options for the command really doesn’t matter at all anymore. I put that argument in the same category as the people who argue that wget is easier to use because you can type it with your left hand only on a qwerty keyboard. Sure, that is indeed true but I read it more like someone trying to come up with a reason when in reality there’s actually another one underneath. Sometimes that other reason is a philosophical one about preferring GNU software (which curl isn’t) or one that is licensed under the GPL (which wget is) or simply that wget is what they’re used to and they know its options and recognize or like its progress meter better. I enjoy our friendly competition with wget and I seriously and honestly think it has made both our projects better and I like that users can throw arguments in our face like “but X can do Y”and X can alter between curl and wget depending on which camp you talk to. I also really like wget as a tool and I am the occasional user of it, just like most Unix users. I contribute to the wget project well, both with code and with general feedback. I consider myself a friend of the current wget maintainer as well as former ones. [Less]
Posted about 10 years ago by Andreas
WebRTC is one of the most exciting things to happen to the Web in years: it has the potential to bring instant voice and video calling to anyone with a browser, finally unshackling us from proprietary plugins and installed apps. Firefox, Chrome, and ... [More] Opera already support WebRTC, and Microsoft recently announced future support. Unfortunately, the full potential of the WebRTC ecosystem has been held back by a long-running disagreement about which video codec should be mandatory to implement. The mandatory to implement audio codecs were chosen over two years ago with relatively little contention: the legacy codec G.711 and Opus, an advanced codec co-designed by Mozilla engineers. The IETF RTCWEB Working Group has been deadlocked for years over whether to pick VP8 or H.264 for the video side. Both codecs have merits. On the one hand, VP8 can be deployed without having to pay patent royalties. On the other hand, H.264 has a huge installed base in existing systems and hardware. That is why we worked with Cisco to develop their free OpenH264 plugin and as of October this year, Firefox supports both H.264 and VP8 for WebRTC. At the last IETF meeting in Hawaii the RTCWEB working group reached strong consensus to follow in our footsteps and make support for both H.264 and VP8 mandatory for browsers. This compromises was put forward by Mozilla, Cisco and Google. The details are a little bit complicated, but here’s the executive summary: Browsers will be required to support both H.264 and VP8 for WebRTC. Non-browser WebRTC endpoints will be required to support both H.264 and VP8. However, if either codec becomes definitely royalty free (with no outstanding credible non-RF patent claims) then endpoints will only have to do that codec. “WebRTC-compatible” endpoints will be allowed to do either codec, both, or neither. See the complete proposal by Mozilla Principal Engineer Adam Roach here. There are still a few procedural issues to resolve, but given the level of support in the room, things are looking good. We believe that this compromise is the best thing for the Web at this time: It lets us move forward with confidence in WebRTC interoperability and allows people who for some reason or another really can’t do one of these two codecs to be “WebRTC-compatible” and know they can interoperate with any WebRTC endpoint. This is an unmitigated win for users and Web application developers, as it provides broad interoperability within the WebRTC ecosystem. It also puts a stake in the ground that what the community really needs is a codec that everyone agrees is royalty-free, and provides a continuing incentive for proponents of each codec to work towards this target. Mozilla has been working for some time on such a new video codec which tries to avoid the patent thickets around current codec designs while surpassing the quality of the latest royalty-bearing codecs. We hope to contribute this technology to an IETF standardization effort following the same successful pattern as with Opus.Filed under: Mozilla Tagged: Mozilla, VP8, WebRTC [Less]
Posted about 10 years ago by stak
Have you ever had the experience of trying to write a document in MS Word (or Open/LibreOffice) and it keeps "correcting" your formatting to something you don't want? The last time I experienced that was about a year ago, and that was when I decided ... [More] "screw it, I'll just write this in HTML instead". That was a good decision. Pretty much anything you might want to use a word processor for, you can do in HTML - and oftentimes it's simpler. Sure, there's a bit of a learning curve if you don't know HTML, but that's true for anything. Now anytime I need to create "a document" (a letter, random notes or signs to print, etc.) I always do it in HTML rather than LibreOffice, and I'm the happier for it. I keep all my data in git repositories, and so it's a bonus that these documents are now in a plaintext format rather than a binary blob. I realized that this is probably part of a trend - a lot of people I know nowadays to "powerpoint" presentations using web technologies such as reveal.js. I haven't seen many people comment on using web tech to do word processing, but I know I do it. The only big "office suite" thing left is the spreadsheet. It would be awesome if somebody wrote a drop-in JS spreadsheet library that you could include into a HTML page and instantly turn a table element into a spreadsheet. I'm reminded of this old blog post by Joel Spolsky: How Trello is different. He talks about most of the people who use Excel really just use it because it provides a table format for entering things, rather than it's computational ability. HTML already provides that, but whenever I've tried doing that I find the markup/content ratio too high, so it always seemed like a pain. It would be nice to have a WSYIWYG tool that let you build a table (or spreadsheet) and import/export it as raw HTML that you can publish, print, share, etc. As an addendum, that blog post by Joel also introduced me to the concept of unshipped code as "inventory", which is one of the reasons I really hate finding old bugs sitting around in Bugzilla with perfectly good patches that never landed! [Less]
Posted about 10 years ago
There has been a lot of discussion lately about Rust’s allocator story, and in particular our relationship to jemalloc. I’ve been trying to catch up, and I wanted to try and summarize my understanding and explain for others what is going on. I am ... [More] trying to be as factually precise in this post as possible. If you see a factual error, please do not hesitate to let me know. The core tradeoff The story begins, like all interesting design questions, with a trade-off. The problem with trade-offs is that neither side is 100% right. In this case, the trade-off has to do with two partial truths: It is better to have one global allocator than two. Allocators like jemalloc, dlmalloc, and so forth are all designed to be the only allocator in the system. Of necessity they permit a certain amount of “slop”, allocating more memory than they need so that they can respond to requests faster, or amortizing the cost of metadata over many allocations. If you use two different allocators, you are paying those costs twice. Moreover, the allocator tends to be a hot path, and you wind up with two copies of it, which leaves less room in the instruction cache for your actual code. Some allocators are more efficient than others. In particular, the default allocators shipped with libc on most systems tend not to be very good, though there are exceptions. One particularly good allocator is jemalloc. In comparison to the default glibc or windows allocator, jemalloc can be noticeably more efficient both in performance and memory use. Moreover, jemalloc offers an extended interface that Rust can take advantage of to gain even more efficiency (for example, by specifying the sizes of a memory block when it is freed, or by asking to reallocate memory in place when possible). Clearly, the best thing is to use just one allocator that is also efficient. So, to be concrete, whenever we produce a Rust executable, everyone would prefer if that Rust executable – along with any C code that it uses – would just use jemalloc everywhere (or whatever allocator we decide is ‘efficient’ tomorrow). However, in some cases we can’t control what allocator other code will use. For example, if a Rust library is linked into a larger C program. In this case, we can opt to continue using jemalloc from within that Rust code, but the C program may simply use the normal allocator. And then we wind up with two allocators in use. This is where the trade-off comes into play. Is it better to have Rust use jemalloc even when the C program within which Rust is embedded does not? In that case, the Rust allocations are more efficient, but at the cost of having more than one global allocator, with the associated inefficiencies. I think this is the core question. Two extreme designs Depending on whether you want to prioritize using a single allocator or using an efficient allocator, there are two extreme designs one might advocate for the Rust standard library: When Rust needs to allocate memory, just call malloc and friends. Compile Rust code to invoke jemalloc directly. This is what we currently do. There are many variations on how to do this. Regardless of which approach you take, this has the downside that when Rust code is linked into C code, there is the possibility that the C code will use one allocator, and Rust code another. It’s important to clarify that what we’re discussing here is really the default behavior, to some extent. The Rust standard library already isolates the definition of the global allocator into a particular crate. End users can opt to change the definition of that crate. However, it would require recompiling Rust itself to do so, which is at least a mild pain. Calling malloc If we opted to default to just calling malloc, this does not mean that end users are locked into the libc allocator or anything like that. There are existing mechanisms for changing what allocator is used at a global level (though I understand this is relatively hard on Windows). Presumably when we produce an actual Rust executables, we would default to using jemalloc. Calling malloc has the advantage that if a Rust library is linked into a C program, both of them will be using the same global allocator, whatever it is (unless of course that C program itself doesn’t call malloc). However, one downside of this is that we are not able to take advantage of the more advanced jemalloc APIs for sized deallocation and reallocation. This has a measureable effect in micro-benchmarks. I am not aware of any measurements on larger scale Rust applications, but there are definitely scenarios where the advanced APIs are useful. Another potential downside of this approach is that malloc is called via indirection (because it is part of libc; I’m a bit hazy on the details of this point, and would appreciate clarification). This implies a somewhat higher overhead for calls to malloc/free than if we fixed the allocator ahead of time. It’s worth noting that this is the normal setup that all C programs use by default, so relative to a typical C program, this setup carries no overhead. (When compiling a statically linked executables, rustc could opt to redirect malloc and friends to jemalloc at this point, which would eliminate the indirection overhead but not take advantage of the specialized jemalloc APIs. This would be a simplified variant of the hybrid scheme I eventually describe below.) Calling jemalloc directly If we opt to hardcode Rust’s default allocator to be jemalloc, we gain several advantages. The performance of Rust code, at least, is not subject to the whims of whatever global allocator the platform or end-user provides. We are able to take full advantage of the specialized jemalloc APIs. Finally, as the allocator is fixed to jemalloc ahead of time, static linking scenarios do not carry the additional overhead that calling malloc implies (though, as I noted, one can remove that overhead also when using malloc via a simple hybrid scheme). Having Rust code unilatelly call jemalloc also carries downsides. For example, if Rust code is embedded as a library, it will not adopt the global allocator of the code that it is embedded within. This carries the performance downsides of multiple allocators but also a certain amount of risk, because a pointer allocated on one side cannot be freed on the other (some argue this is bad practice; this is certainly true if you do not know that the two sides are using the same allocator, but is otherwise legitimate, see the section below for more details). The same problem can also occur in reverse, when C code is used from within Rust. This happens today with rustc: due to the specifics of our setup, LLVM uses the system allocator, not the jemalloc allocator that Rust is using. This causes extra fragmentation and memory consumption. It’s also not great because jemalloc is better than the system allocator in many cases. To prefix or not to prefix One specific aspect of calling jemalloc directly concerns how it is built. Today, we build jemalloc using name prefixes, effectively “namespacing” it so that it does not interfere with the system allocator. This is what causes LLVM to use a different allocator in rustc. This has the advantage of clarity and side-stepping certain footguns around dynamic linking that could otherwise occur, but at the cost of forking the allocators. A recent PR aimed to remove the prefix. It was rejected because in a dynamic linking scenario, this creates a fragile situation. Basically, the dynamic library (“client”) defines malloc to be jemalloc. The host process also has a definition for malloc (the system allocator). The precise result will depend on the flags and platform that you’re running on, but there are basically two possible outcomes, and both can cause perfectly legitimate code to crash: The host process wins, malloc means the same thing everywhere (this occurs on linux by default). malloc means different things in the host and the client (this occurs on mac by default, and on linux with the DEEPBIND flag). In the first case, crashes can arise if the client code should try to intermingle usage of the nonstandard jemalloc API (which maps to jemalloc) with the standard malloc API (which the client believes to also be jemalloc, but which has been remapped to the system allocator by the host). The jemalloc documentation isn’t 100% explicit on the matter, but I believe it is legal for code to (e.g.) call mallocx and then call free on the result. Hence if Rust should link some C code that did that, it would crash under the first scenario. In the second case, crashes can arise if the host/client attempt to transfer ownership of memory. Some claim that this is not a legitimate thing to do, but that is untrue: it is (usually) perfectly legal for client code to (e.g.) call strdup and then pass the result back to the host, expecting the host to free it. (Granted, it is best to be cautious when transfering ownership across boundaries like this, and one should never call free on a pointer unless you can be sure of the allocator that was used to allocate that pointer in the first place. But if you are sure, then it should be possible.) UPDATE: I’ve been told that on Windows, freeing across DLL boundaries is something you can never do. On Reddit, Mr_Alert writes: “In Windows, allocating memory in one DLL and freeing it in another is very much illegitimate. Different compiler versions have different C runtimes and therefore different allocators. Even with the same compiler version, if the EXE or DLLs have the C runtime statically linked, they’ll have different copies of the allocator. So, it would probably be best to link rust_alloc to jemalloc unconditionally on Windows.” Given the number of differences between platforms, it seems likely that the best behavior will ultimately be platform dependent. Fundamentally, the problems here are due to the fact that the client is attempting to redefine the allocator on behalf of the host. Forcing this kind of name conflict to occur intentionally seems like a bad idea if we can avoid it. A hybrid scheme There is also the possibility of various hybrid schemes. One such option that Alex Crichton and I put together, summarized in this gist, would be to have Rust call neither the standard malloc nor the jemalloc symbols, but rather an intermediate set of APIs (let’s call them rust_alloc). When compiling Rust libraries (“rlibs”), these APIs would be unresolved. These rust allocator APIs would take all the information they need to take full advantage of extended jemalloc APIs, if they are available, but could also be “polyfilled” using the standard system malloc interface. So long as Rust libraries are being compiled into “rlibs”, these rust_alloc dependencies would remain unresolved. An rlib is basically a statically linked library that can be linked into another Rust program. At some point, however, a final artifact is produced, at which point the rust_alloc dependency must be fulfilled. The way we fulfill this dependency will ultimately depend on what kind of artifact is produced: Static library for use in a C program: link rust_alloc to malloc Dynamic library (for use in C or Rust): link rust_alloc to malloc Executable: resolve rust_alloc to jemalloc, and override the system malloc with jemalloc as well. This seems to offer the best of both worlds. Standalone, statically linked Rust executables (the recommended, default route) get the full benefit of jemalloc. Code that is linked into C or dynamically loaded uses the standard allocator by default. Any C code used from within Rust executables will also call into jemalloc as well. However, there is one major caveat. While it seems that this scheme would work well on linux, the behavior on other platforms is different, and it’s not yet clear if the same scheme can be made to work as well on Mac and Windows. Naturally, even if we sort out the cross-platform challenges, this hybrid approach too is not without its downsides. It means that Rust code built for libraries will not take full advantage of what jemalloc has to offer, and in the case of dynamic libraries there may be more overhead per malloc invocation than if jemalloc were statically linked. However, by the same token, Rust libraries will avoid the overhead of using two allocators and they will also be acting more like normal C code. And of course the embedding program may opt, in its linking phase, to redirect malloc (globally) to jemalloc. So what should we do? The decision about what to do has a couple of facets. In the immediate term, however, we need to take steps to improve rustc’s memory usage. It seems to me that, at minimum, we ought to accept strcat’s PR #18915, which ensures that Rust executables can use jemalloc for everything, at least on linux. Everyone agrees that this is a desirable goal. Longer term, it is somewhat less clear. The reason that this decision is difficult is that there is no choice that is “correct” for all cases. The most performant choice will depend on the specifics of the case: Is the Rust code embedded? How much allocation takes place in Rust vs in the other language? What allocator is the other language using? (As an example, the performance and memory use of rustc improved when we adopted jemalloc, even partially, but other applications will fare differently.) At this point I favor the general principle that Rust code, when compiled as a library for use within C code, should more-or-less behave like C code would behave. This seems to suggest that, when building libraries for C consumption, Rust should just call malloc, and people can use the normal mechanisms to inject jemalloc if they so choose. However, when compiling Rust executables, it seems advantageous for us to default to a better allocator and to get the maximum efficiency we can from that allocator. The hybrid scheme aims to achieve both of these goals but there may be a better way to go about it, particularly around the area of dynamic linking. I’d like to see more measurement regarding the performance impact of foregoing the specialized jemalloc APIs and using weak linking. I’ve seen plenty of numbers suggesting jemalloc is better than other allocators on the whole, and plenty of numbers saying that using specialized APIs helps in microbenchmarks. But it is unclear what the impact of such APIs (or weak linking) is on the performance of larger applications. I’d also like to get the input from more people who have experience in this area. I’ve talked things over with strcat a fair amount, who generally favors using jemalloc even if it means two allocators. We’ve also reached out to Jason Evans, the author of jemalloc, who stressed the fact that multiple global allocators is generally a poor choice. I’ve tried to reflect their points in this post. Note though that whatever we decide we can evolve it as we go. There is time to experiment and measure. One thing that is clear to me is that we do not want Rust to “depend on” jemalloc in any hard sense. That is, it should always be possible to switch from jemalloc to another allocator. This is both because jemalloc, good as it is, can’t meet everyone’s needs all the time, and because it’s just not a necessary dependency for Rust to take. Establishing an abstraction boundary around the “Rust global allocator” seems clearly like the right thing to do, however we choose to fulfill it. [Less]